Handout 22 math 146-1 - statistics math probelems and answers to help you figure out PDF

Title Handout 22 math 146-1 - statistics math probelems and answers to help you figure out
Author Alexa Addis
Course Practicum Iv: Math
Institution Western Washington University
Pages 3
File Size 129.8 KB
File Type PDF
Total Downloads 118
Total Views 132

Summary

statistics math probelems and answers to help you figure out...


Description

The central limit theorem - the idea

Math 146

Handout 22

The central limit theorem says that even if the original distribution from which a sample was taken was not a normal distribution, the distribution of sample means, if n (sample size) is at least 30, will have a normal distribution. On the previous handout, an example was given of a class in which the instructor gave out 20% C’s (grade of 2.0), 30% B’s (grade of 3.0) and 50% A’s (grade of 4.0). Then for samples of size 2, it was found that the sample mean would be 2.0 with probability 4%, 2.5 with probability 12%, 30 with probability 29%, 3.5 with probability 30%, and 4.0 with probability 25%. The probability histograms shown below compare the distribution of the data values (the grades given in the class) with the distributions of the sample means (for samples of size n = 2). Notice that the same axis is used in each case; since no grades of 2.5 or 3.5 were given, the graph of the data has no bars for 2.5 or 3.5. probability 50%

50%

45%

45%

40%

40%

35%

35%

30%

30%

25%

25%

20%

20%

15%

15%

10%

10%

5%

5%

0

0 2.0

2.5 3.0

3.5

4.0

Distribution of data

x

2.0 2.5 3.0 3.5 4.0

x

Distribution of sample means (n = 2)

Above when defining the central limit theorem, it says the distribution of sample means will be a normal distribution if n ≥ 30. Here n = 2 is well below 30, yet observant students may note that the distribution looks a little bit like a normal bell-curve, at least much more than the distribution of data itself. Note that you can think of the original distribution as the distribution of sample means for n = 1). What if we make the sample size large, perhaps double it to n = 4 or double it again to n = 8?

If the sample size is doubled to 4, the possible sample means are 2.0, 2.25, 2.5, 2.75, 3.0, 3.25, 3.5, 3.75 and 4.0 . The histogram looks like this: probability 25% 20% 15% 10% 5% 0 2.0 2.25

2.5 2.75

3.0

3.25 3.5 3.75 4.0

sample mean

x (n=4)

If the sample size is doubled again, to 8, the histogram looks like this: probability 15% 12% 9% 6% 3% 0 2.0 2.125 2.25 2.375 2.5 2.625 2.75 2.875 3.0

3.125

3.25 3.375

3.5 3.625 3.75 3.875

4.0

sample mean

x (n=8)

Hopefully you can see that as the sample size increases from 2 to 4 to 8, the distribution of sample means looks more and more like a normal (bell-curve shaped) distribution. Above, note that when the central limit theorem was defined at the start of this handout, it said “even if the original distribution was not a normal distribution” we need a sample size of at least 30. What if the original distribution is a normal distribution? In that case you don’t need a sample size of at least 30, the distribution of sample means will be a normal distribution for any size sample. This can be summarized in a table:

Distribution of data (x’s) is normal

Distribution of data (x’s) not normal

n < 30

Distribution of sample means is normal, methods of chapters 7 and 8 work

Dist. of sample means not normal, methods do not work

n ≥ 30

Dist. of sample means normal, methods of chapters 7 and 8 work

Dist. of sample means normal, methods of chapters 7 and 8 work

The “bottom line” is that you can use a small sample (less than 30) if the data has a normal distribution (in some cases even approximately normal), but if the data does not have a normal distribution, then you should use a sample of at least 30 to ensure that the methods of chapters 7 and 8 work. There are many examples in statistics textbooks that come from real-life studies that used sample sizes of less than 30. Generally you will find that these examples are prefaced with words like “the population was believed to be a normal distribution.” How can the original distribution be non-normal? If the original distribution was not a normal distribution, then it does not have a bell-curve shape. Here are some examples of other shapes that density functions for real-world data can have: uniform

exponential x

bimodal x

x

The standard deviation for the sample mean In section 6.4 and Handout 21, it was shown that the mean of the sample means must equal the population mean: x   .

The standard deviation of the sample means does not equal the population standard deviation, because there is less variation in the sample means than in the original data itself. This is discussed on the first page of Handout 21. The standard deviation of the sample means is given by: x 

 n

What this equation says is that the standard deviation of the sample mean is found by taking the standard deviation of the population, then dividing by the square root of the sample size. The important thing to note is that  x   , so that the standard deviation of the sample means is always smaller than the standard deviation of the data values. When you look at the two probability histograms shown on page 1 of this handout, hopefully you can see that the standard deviation of the sample means (right graph) is smaller than the standard deviation of the population (left graph). As you go from the first graph (population) to the second graph (sample means n = 2) to the third graph (sample means n = 4) to the fourth graph (sample means n = 8), notice how the “weight” of the bars shifts from 2.0 and 4.0 in the beginning and starts to cluster around the sample mean of 3.3. It is not difficult to calculate the standard deviation for both distributions, either by using the 1-Var Stats L1, L2 command from Handout 14 (or else using the actual formula for the standard deviation of a probability distribution). If you try it, you will get   .781 and  x  .552 , and then .781 / 2  .552 ....


Similar Free PDFs