2019- Statun 1101 Introduction to Statistics-lecture notes Confidence Interval PDF

Title 2019- Statun 1101 Introduction to Statistics-lecture notes Confidence Interval
Course Introduction To Statistics
Institution Columbia University in the City of New York
Pages 4
File Size 192.8 KB
File Type PDF
Total Downloads 91
Total Views 141

Summary

Confidence Interval....


Description

Confidence Intervals (Ch.18)

1

Confidence Interval for proportions

Estimation • With data and a model, we want to estimate the parameters in the model. • For example, you flip a coin 1000 times and let X be the number of heads in the 1000 trials. Then X Bin 1000, p , where p is the probability of getting a head. • With X, you want to estimate p. Naturally, you would estimate p by pˆn X n. The subscript n refers to the fact that pˆn depends on the sample size n. The hat on top of p usually means it is an estimator of the parameter. • In statistics, knowing an estimator of a parameter is often not good enough. Ideally, we would like to know the distribution of the estimator. In particular, this distribution tells us the amount of variation we should expect or how precise our estimate is. Confidence Interval (CI): • A 95% confidence interval is a random interval such that the true parameter lies in it with 95% probability. • In reality, we often only have one sample. We can only compute a realization of the confidence interval. (we will also refer to this realization simply as confidence interval) • If we can sample infinitely many samples of n data and compute the realizations, then 95% of the intervals will cover the true parameter. • In general, we can have 100 1 α % confidence interval. 100 1 confidence level or the confidence coefficient.

α % is called the

The (approximate) 95% confidence interval for p in Bin n, p is pˆn

pˆn 1

pˆn n

z0.025, pˆn

pˆn 1

pˆn n

where pˆn X n and z0.025 is the quantile of N 0, 1 such that P N 0, 1 fact, z0.025 1.96 2. Some additional terms: 1

(1)

z0.025 , z0.025

0.025. In

2. SAMPLE SIZE DETERMINATION FOR PROPORTION

1 p

p 1. Recall that pˆn N p, n ever, we can estimate it by standard error.

p 1 p . Since p is unknown, we do not know n . How. The estimate of the standard deviation is called

pˆn 1 pˆn n

2. The number z0.025 is called the critical value. 3. The extent of the confidence interval on either side of pˆn is called the margin of error. Example 1.1. Late in 2010, Pew Research surveyed U.S. residents to ask about their use of social networking sites. Of the 156 respondents aged 18 to 22 who use Facebook, 48 said that they update their status at least daily. Find the 95% confidence interval of the population proportion p who update their status at least daily. Solution: pˆn 48 156

48 156. The 95% CI is 1.96

48 156

1 156

48 156

,

48 156

48 156

1.96

48 1 156 156

0.235, 0.380 .

You can say “We are 95% confident that between 23.5% and 38.0% of Facebook users between the ages of 18 and 22 update their status at least daily.” More formally, you can say “We are 95% confident that the interval from 23.5% and 38.0% captures the true proportion of U.S. Facebook users between the ages of 18 and 22 who update their profiles daily.” Remark: you cannot say p is between 23.5 and 38.0 with 95% probability. This is because p is fixed (not random), P 0.235 p 0.380 is either 0 or 1. Instead, p lies in the random interval with probability 95%. Since we only have one realization, we are 95% “confident” that p lies between 0.235, 0.380 .

2

Sample Size Determination for Proportion

Recall that the margin of error with 1

α confidence is

Margin of error



pˆ 1 2

pˆ n

.

Suppose that we want to keep the margin of error to be within 3% with 0.95 confidence. How large a sample do we require? Solving n in terms of pˆ, we have 1.96 0.03

n

2

pˆ 1

pˆ .

The determination of the sample size requires the estimate p. ˆ Sometimes, we could have a guess and plug that into the formula. To be conservative, we could plug pˆ 0.5. This is the value that maximizes pˆ 1 pˆ over p 0, 1 . With pˆ 0.5, we have n 1067.1. Therefore, we need at least 1068 people. (Do not round down) • From the formula, we see that when we want the margin of error to be smaller, we require larger n. • If we want higher confidence, zα

2

will be larger and we require larger n. 2

3. THEORY

Example 2.1. The Yale/George Mason poll that estimated that 40% of all voters believed that scientists disagree about whether global warming exists had a margin of error of 3%. Suppose an environmental group planning a follow-up survey of voters’ opinions on global warming wants to determine a 95% confidence interval with a margin of error of no more than 2%. How large a sample do they need? (You could take p = 0.5, but we have data that indicates p = 0.40, so we can use that.) Ans: 1.96 0.02

n

2

0.4 0.6

2304.96.

The environmental group’s survey will need at least 2305 respondents.

3

Theory

Why (1) is the 95% CI for p? Recall that by the central limit theorem n pˆn p 1

p p

N 0, 1 .

We also know from the law of large numbers that pˆn pˆn 1

pˆn

p. Therefore,

p1

p.

Equivalently, we have p1 pˆn 1

p pˆn

1.

Therefore, n pˆn pˆn 1

p 1 pˆn 1

p pˆn

n pˆn p p 1 p

p pˆn

N 0, 1 .

Hence, P



n pˆn pˆn 1

2

p pˆn

1

zα 2

α.

Rearranging the terms, we have P pˆn

pˆn 1

pˆn n

zα 2 ,

p

pˆn

pˆn 1

pˆn n



2

1

α.

Remark: there are some details omitted in the derivation.

4

CI for difference in proportions

Let X1 Bin n1 , p1 and X2 Bin n2 , p2 . Suppose that we are interested in the difference in proportions p1 p2 . Naturally, we would estimate p1 p2 by pˆ1 pˆ2 : X1 n X2 n. The following is the approximate 1 α CI for p1 p2 : pˆ1

pˆ2



2

pˆ1 1 pˆ1 n1 3

pˆ2 1 pˆ2 . n2

4. CI FOR DIFFERENCE IN PROPORTIONS

Example 4.1. In criminal proceedings a convicted defendant is sometimes sent to prison by the presiding judge and is sometimes not. A question has arisen in legal circles as to whether a judge’s decision is affected by (1) whether the defendant pleaded guilty or (2) whether he or she pleaded innocent but was subsequently found guilty. The following data refer to individuals, all having previous prison records, convicted of second-degree robbery. • 74, out of 142 who pleaded guilty, went to prison • 61, out of 72 who pleaded not guilty, went to prison Find the 95% confidence interval of the difference in proportions. Ans: 61 72

74 142

1.96

61 72

1 72

61 72

74 142

1 142

74 142

0.209, 0.443 ,

which does not contain 0. Therefore, we may conclude there is really a difference in the two proportions.

4...


Similar Free PDFs