Title | Worksheet-12-Descriptive statistics Studocu |
---|---|
Author | Linderson Johns |
Course | Finite Mathematics |
Institution | Central Washington University |
Pages | 7 |
File Size | 320.1 KB |
File Type | |
Total Downloads | 4 |
Total Views | 161 |
Weekly worksheet assignment...
Math 130
WORKSHEET 12
Descriptive Statistics Answers are provided, so please show your work for credit. CENTRAL LOCATION/TENDENCY:
¯x =
∑ xi
1. Mean (Average)
n
(
∑
is summation; n is the data set size.)
e.g. Find the mean of the data set: 75, 81, 88, 92. ¯X =
75+81 + 88 + 92 =84 4
Mean = 84
Ans. 84 2. Median (page 30) The median is the middle value when the data are sorted in ascending order. e.g.1 Find the median of the data set: 5, 3, 4, 7, 8, 8, 1. Arrange in ascending order = 1, 3, 4, 5, 7, 8, 8 Counting three numbers from each side median = 5
e.g.2 Find the median of the data set: 5, 3, 4, 7, 8, 9, 11, 2. Ascending = 2, 3, 4, 5, 7, 8, 9, 11. Counting three we find median = 5 and 7 5 7 6 Median = 2
The median = 6
Ans. 5; 6 3. Mode If a data set has a value that occurs more often than any of the others, then that value is called the mode. e.g. Find the mode of the data set: 1, 2, 3, 4, 5, 5, 6, 2, 2 Find the mode of the data set: 1, 2, 3, 4, 5, 5, 6, 2, 2 Two appears three times, five appears two times and other numbers appear once The mode = 2 Ans. 2
SPREAD (or VARIABILITY): 1. Range Range is the difference between the largest and the smallest values. Range = largest - smallest = Max - Min e.g. Find the range of the data set: 4, 5.3, 6.1, 4, 3, 6, 12. Range = largest - smallest = Max – Min Range = 12-3 =9
The range = 9 Ans. 9 2. Variance Variance is a way to measure how disperse the data are. (∑ x) x − 2 ∑ ( x− ¯ x )2 ∑ n Variance s = = n−1 n−1
2
2
where n is the data set size.
e.g. Find the variance of the data set: 1, 3, 5, 7, 9, 11.
1 6 S2 =
2
2
2
2
2
2
3 6 5 6 7 6 9 6 11 6 70 1 4 6 1 5
The variance = 14
Ans. 14 3. Standard Deviation Simply the square root of the variance. Standard Deviation s =
√ Variance
e.g. Find the standard deviation of the data set: 11, 13, 15, 17, 19, 25.
11 16.6667
2
13 16.6667 15 16.6667 17 16.6667 19 16.6667 25 16.6667 6 1 2
2
2
2
123.3333 24.6667 5
S 24.6667 4.9666
The Standard deviation = 4.9666 Ans. Variance 24.67 standard deviation 4.9666 RELATIVE POSITION: 1. Z-score The z-score measures how far one data point is away from the central location of the data set. It indicates the number of standard deviations a data point is
2
away from the mean.
z=
x−¯x s
e.g. Given a data set: 51, 62, 73, 74, 87, 95, find (a) the z-score of the data point 87. (b) the z-score of 51.
51 73.6667
2
2
2
2
2
62 73.6667 73 73.66 67 74 73.6667 87 73.6667 9 5 73 .6667 61
1283.3333 S2 256.66667 5 S 256.66667 S 16.02082 51 62 73 74 87 95 X 73.66667 6 x X z s 87 73.66667 0.83225 z 16. 02082
The Z-score 87= 0.83225 a. the z-score of 51. x X s 51 73.66667 z -1.41483 16.02082 z
The Z-score 51 = -1.41483 Ans. 0.83; -1.42 (mean 73.67, std. dev. 16.02) 2. Percentile This measure is used only when the data set is large. The k-th percentile Pk is the value such that k% of data in this data set have value smaller than Pk. The 25th percentile P25 is called the first quartile Q1, th 50th percentile P50 is called the second quartile Q2, the 75th percentile P75 is called the third quartile Q3. e.g. What is the relationship between these three measures: median, Q2 , P50 ? Use the data set below to answer questions 3. and 4.
3. Tukey’s 5-Number Summary (page 199-200) The 5-Number Summary includes Min, Q1 , Q2 , Q3 , Max.
2
Ans. 14, 30, 35, 40, 50 4. Boxplot (Numerical and Graphical) A boxplot displays a data set's 5-Number Summary.
To measure Spread (Variability) of a data set, we may also use the Interquartile Range IQR. IQR = Q3 – Q1
Ans. 10 What is an outlier? Page 203 5. Is there outlier(s) in the following data set?
Ans. Lower fence 15, upper fence 55; 2 outliers 14, 14 Use TI-83 calculator or Minitab to analyze the following data Data Display (original) Test 3 score 40 39 45 38 29 44 29 33 41
14 30 33
40 14 34
35 43 31
40 32 31
50 40
30 30
25 15
36 40
43 37
49 28
30 32
38 49
28 36 49
29 37 49
29 38 50
30 38
30 39
30 40
30 40
31 40
31 40
32 40
32 41
Data Display (sorted) Test 3 score 14 14 15 33 33 34 43 43 44
25 35 45
I n di v i dua l V a l ue P l o t o f Te s t 3
10
20
30 Te s t 3
40
50
H i s t o g r a m of Te s t 3 12
10
Fr e q u e n c y
8
6
4
2
0 15
20
25
30
35 Te s t 3
Figure 1
40
45
50
B o x p l o t o f Te s t 3 50
Te s t 3
40
30
20
10
S u mm mma a r y f o r Te s t 3 A nderson-Darling N ormality T est
20
30
40
A -Squared P -V alue
0.61 0.103
M ean S tD ev V ariance S kew ness Kurtosis N
34.784 8.798 77.396 -0.629842 0.627270 37
M inimum 1st Q uartile M edian 3rd Q uartile M aximum
50
14.000 30.000 35.000 40.000 50.000
95% C onfidence Interv al for M ean 31.851
37.717
95% C onfidence Interv al for M edian 31.101
39.899
95% C onfidence Interv al for S tD ev
9 5 % C o nf i d e nc nce e I nter v als
7.154
11.428
Mean
Median 30
32
34
36
38
40
Chebyshev’s inequality says that at least 1-1/K2 of data from a sample must fall within K standard deviations from the mean, where K is any positive real number greater than one. We can also state the inequality above by replacing the phrase “data from a sample” with probability distribution. This is because Chebyshev’s inequality is a result from probability, which can then be applied to statistics.
To illustrate the inequality, we will look at it for a few values of K:
For K = 2 we have 1 – 1/K2 = 1 - 1/4 = 3/4 = 75%. So Chebyshev’s inequality says that at least 75% of the data values of any distribution must be within two standard deviations of the mean. For K = 3 we have 1 – 1/K2 = 1 - 1/9 = 8/9 = 89%. So Chebyshev’s inequality says that at least 89% of the data values of any distribution must be within three standard deviations of the mean. For K = 4 we have 1 – 1/K2 = 1 - 1/16 = 15/16 = 93.75%. So Chebyshev’s inequality says that at least 93.75% of the data values of any distribution must be within two standard deviations of the mean.
Applying Chebyshev’s Theorem (Inequality): Q1. According to the Chebyshev’s theorem, at least what percentage of the data fall within 2 standard deviations either side of the mean?
from Chebyshev’s theorem; percentage of values k standard deviation from mean = (1-(1/k2))*100 = (1-(1/22)) *100 = (3/4) *100 =75.0%
Q2. What’s the actual percentage of the data fall within 2 standard deviations either side of the mean? 34/37 of the data are within two standard deviation = (34/37) * 100 = 0.91892 * 100 = 91.892%
Q3. According to the Chebyshev’s theorem, at least what percentage of the data fall within 3 standard deviations either side of the mean? from Chebyshev’s theorem; percentage of values k standard deviation from mean = (1-(1/k2)) *100 = (1-(1/32)) *100 = (0.888889) *100 = 88.89% or 8/9
Q4. What’s the actual percentage of the data fall within 3 standard deviations either side of the mean? All the data are within 3 standard deviation (37/37) * 100% = 100%...