Title | Spring 2008 Final Exam Solutions |
---|---|
Course | Basic Statistical Meth |
Institution | Georgia Institute of Technology |
Pages | 8 |
File Size | 173.3 KB |
File Type | |
Total Downloads | 37 |
Total Views | 142 |
Spring 2008 Final Exam Solution...
2028B: Basic Statistical Methods (Spring 2008) (Version A) Final Exam-Solution
NAME:
GTid #:
HONOR CODE STATEMENT
I pledge on my honor that: I have completed all steps of the exam on my own, I have not used any unauthorized materials while completing this exam, and I have not given anyone else access to my exam.
Signature and date
This test is 170 minutes long. You are allowed THREE cheat sheets including the cheat sheet you used in the first two tests. Do not look at or start the test until you are told to do so. When we ask you to return the test, stop immediately, hand the test in, and do not utter a word to anyone. Please show your answers on this sheet. GOOD LUCK!
1
1. [12 points = 6×2 pts] Fill in the blanks: (a) Analysis of
is abbreviated as ANOVA. [VARIANCE]
(b) The least-squares method MIZES]
the sum of the squares of the residuals. [MINI-
(c) Three basic principles of a good design are (1) ing. [RANDOMIZATION] (d) Any linear function of the form w =
k P
i=1
ci wi where
µ2 − µ3 − µ4 = 0) is called a comparison or a [CONTRAST]
, (2) replication and (3) blockk P
i=1
ci = 0 (for example, µ1 + in the treatment means.
(e) In the One-Way ANOVA, ‘Ho : µ1 = µ2 = · · · = µk vs HA : at least two of means are not equal’ can be written as ‘Ho : α1 = α2 = · · · = αk = 0 vs HA : . [AT LEAST ONE OF THE αi ’s IS NOT ZERO] (f)
SSE/(n − k − 1) in the multiple linear regression model SST/(n − 1) with k independent variables. [ADJUSTED] R2 is defined as 1 −
2. [12 points = 6×2 pts] True or False? (a) r 2 = R2 is always true for a multiple linear regression. [ ] [FALSE, it’s true only for a simple linear regression] SSR ∼ χ2 (1). [ ] [TRUE] (b) In a simple linear regression, σ2 (c) The quantity t2 with k degrees of freedom equals to the quantity f with 1 and k degrees of freedom. [ ] [TRUE] (d) R2 always decreases if we add more regressors to the model. [ DECREASES IF...]
] [FALSE, IT
(e) If we fail to reject ‘Ho : β1 = β2 = · · · = βk = 0 in a multiple linear regression, then y should be constant. [ ] [TRUE] (f) We can use the One-Way ANOVA method to compare means from k ≥ 2 populations. [ ] [TRUE] 3. [30 points] Mercury pollution is a serious problem in some waterways. Mercury levels often increase after a lake is flooded due to leaching of naturally occurring mercury by the higher levels of the water. Excessive consumption of mercury is well known to be deleterious to human health. It is difficult and time consuming to measure every persons mercury level; a quick procedure would be nice that could be used to estimate the mercury level of a person based upon the average mercury level found in fish and estimates of the person’s consumption of fish. The data was collected on the methyl mercury intake of 13 subjects and the actual mercury levels recorded in the blood stream from a random sample of people around recently flooded lakes. 2
(a) [8 points: 1pt/each] A R session produced the following incomplete output. Based on the following output, complete the ANOVA table below. Call: lm(formula = Level ~ Intake) Coefficients: Estimate (Intercept) 50.44 Intake 0.4529
Std. Error 47.77 0.1154
t value 1.06 3.92
Pr(>|t|) 0.314 0.002
Residual standard error: 84.4474 on 4 degrees of freedom Multiple R-squared: 0.583, Adjusted R-squared: 0.546 Analysis of Variance Df Sum Sq Mean Sq Regression __ 109863 ______ Residual Error __ ______ ______ Total __ ______
F value Pr(>F) ______ 0.002
Solution: Regression
Df 1
Sum Sq 109863
Residual Error
11
Total
12
78581 =188444-109863 188444 =109863/0.583
Mean Sq F value 109863 15.3789 =109863/1 =109863/7143.73 7143.73 =78581/11
Pr(>F) 0.002
(b) [8 points] Do all tests in this problem using α = 0.05 based on the output above. Make comments. Solution: This output provides p−values for two hypothesis tests: (1) Ho : β0 = 0 vs HA : β0 6= 0 and (2) Ho : β1 = 0 vs HA : β1 6= 0. Since p−value for the test (1) is 0.314, we fail to reject Ho . However, we reject Ho in (2) because p−value is 0.002 < 0.05. (c) [4 points] Test the hypothesis Ho : β1 = 0 vs HA : β1 = 6 0. Use α = 0.10. Solution: Since the computed t−statistic is 3.92 and t(0.10, 11) = t0.10 (11) = 1.363, we reject Ho . (d) [4 points] Test the hypothesis Ho : β0 = 100 vs HA : β0 < 100. Use α = 0.05. b0 − 100 50.44 − 100 = −1.0375 > −t(0.05, 11) = −1.796, Solution: Since t = = 47.77 s.e(b0 ) we fail to reject Ho . (e) [6 points] Assume that we want to predict what is the level for intake = 200. The following table has all the information. Instead of supplied 95% confidence/prediction 3
intervals, find both 99% confidence and prediction intervals (for the same value of intake). Predicted Values New Intake Fit StDev Fit 95.0% CI 200 141.0 29.9 (75.3, 206.8)
95.0% PI (-56.1, 338.2)
q 2 Solution: Note that CI for yb0 is yb0 ± tα/2 (n − 2)s 1n + (x0 −(x)) . Since 95% Sxx q 2 CI is (75.3, 206.8), 2 × t0.025 (11)s n1 + (x0 −(x)) = 131.5 and t0.025 (11) = 2.201, Sxx q 2 s n1 + (x0 −(x)) = 29.8728. Therefore, 99% CI is 141.0 ± 3.106 × 29.8728 = 141.0 ± Sxx 92.7849 = (48.2151, 233.7849). The 99% PI is (−137.213, 419.213). 4. [6 points] Compute the correlation coefficient between x and y. Are x and y positively related?
n
xi
yi
x2i
yi2
xi yi
1
75
4
5625
16
300
2
83
7
6889
49
581
3
67
2
4489
4
134
4
89
9
7921
81
801
5
95
12
9025
144
1140
6
79
6
6241
36
474
7
81
6
6561
36
486
sum
569
46
46751
366
3916
Solution: Since Sxx =
7 P
i=1
x2i − (
7 P
i=1
xi )2 /7 = 46751 − (5692 )/6 = 499.43, Syy =
7 P
i=1
y2i −
7 7 7 7 P P P P ( yi )2 /7 = 366 − (462 )/6 = 63.71 and Sxy = xi yi − ( xi )( yi )/7 = 3916 − i=1
i=1
i=1
i=1
176.86 Sxy =p (569)(46)/6 = 176.86, r = p = 0.9915. Since r = 0.9915 > 0, Sxx Syy (499.43)(63.71) they’re positively correlated. 5. [10 points] A set of experimental runs was made to determine a way of predicting cooking time y at various levels of oven width x1 , and flue temperature x2 . The coded data were recorded as follows: 4
y
x1
x2
6.40
1.32
1.15
15.50
2.69
3.40
18.75
3.56
4.10
30.25
4.41
8.75
44.85
5.35
14.82
The equation yi = β0 + β1 x1i + β2 x2i + ǫi can be written y = Xβ + ǫ where y, X, β and ǫ are matrices. (a) [6 points: 4pts+2pts] Write the matrices X and 1 1.32 1.15 1 2.69 3.40 β 0 Solution: X = 1 3.56 4.10 and β = β1 1 4.41 8.75 β2 1 5.35 14.82
β.
.
(b) [4 points] Write the form of SSE in terms of matrices above. Solution: SSE = (y − Xβ)t (y − Xβ) or SSE = ky − Xβk22 6. [30 points] Suppose in an industrial experiment that an engineer is interested in how the mean absorption of moisture in concrete varies among 5 different concrete aggregates. The samples are exposed to moisture for 48 hours. It is decided that 6 samples are to be tested for each aggregate, requiring a total of 30 samples to be tested. Aggregate
1
2
3
4
5
551
595
639
417
563
457
580
615
449
631
450
508
511
517
522
731
583
573
438
613
499
633
648
415
656
632
517
677
555
679
(a) [6 points: 2pts/each] Write the definition of SST, SSA and SSE . SST
=
n k X X
(yij − y·· )2
i=1 j=1
5
SSA =
k X n X
(yi· − y·· )2 = n
k X n X
(yij − yi· )2
i=1 j=1
SSE
=
k X
(yi· − y·· )2
i=1
i=1 j=1
(b) [7 points: 1pt/each] Complete the ANOVA table below. Analysis of Variance Table Df Treatment ___ Residuals ___ Total ___
Sum Sq Mean Sq F value ______ 21339 _______ 124020 _____ ______
Analysis of Variance Table Df Treatment 4 Residuals 25 Total 29
Sum Sq Mean Sq F value 85356=4*21339 21339 4.3015 124020 4960.8=124020/25 209376
(c) [6 points] Test the hypothesis for the means at the 0.05 level of significance. Write your decision and conclusion. Solution: i. Ho : µ1 = · · · = µ5 vs HA : at least two of means are not equal. ii. α = 0.05 iii. Test statistic: F ∼ (4, 25) iv. Computed f−statistic: 4.3015 v. Since f(0.05, 4, 25) = 2.76 < f, we reject Ho and conclude that at least two of means are not equal. (d) [7 points] Perform all possible (different) paired comparisons of the five treatments based on the box-plots below. And then verify your answers based on the R output.
6
Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = agg.trt ~ trt) $trt 2-1 3-1 4-1 5-1 3-2
diff lwr upr p adj 16.0000000 -103.42650 135.42650 0.9946026 57.1666667 -62.25984 176.59317 0.6297485 -88.1666667 -207.59317 31.25984 0.2243248 57.3333333 -62.09317 176.75984 0.6272414 41.1666667 -78.25984 160.59317 0.8472695 7
4-2 -104.1666667 -223.59317 15.25984 0.1088202 5-2 41.3333333 -78.09317 160.75984 0.8453941 4-3 -145.3333333 -264.75984 -25.90683 0.0116387 5-3 0.1666667 -119.25984 119.59317 1.0000000 5-4 145.5000000 26.07350 264.92650 0.0115253
Solution: There are totally 10 =
5(5 − 1) paired comparisons. 2
paired comparisons µ1 vs µ2 , µ2 vs µ3 , µ2 vs µ5 µ1 vs µ3 , µ1 vs µ4 , µ1 vs µ5 , µ2 vs µ4 µ3 vs µ4 , µ4 vs µ5 µ3 vs µ5
comments little different different very different same
From the R output on Tukey’s Multiple Comparisons of means, we can say: i. Since p−value for the test Ho : µ3 = µ5 is 1.0000, we fail to reject Ho and conclude that those means are same. ii. Since p−values for the tests Ho : µ4 = µi , i = 3 or 5, are less than 0.05, we reject Ho . (e) [4 points] Do Tukey’s Test for aggregates 2 and 3 at α = 0.05. Write your work in details. Do not use the R output. Note y2· = 569.33 and y3· = 610.50. Solution: Since p−value from R output is 0.8473, we fail to reject Ho : µ2 = µ3 . Or Since |y2· − y3· | = 41.17 p , q(0.05, 5, 25) ≈ 4.17, s = 70.43 p and n = 6, the 95% IC is (y2· − y3· ) ± q(α, k, v )s 1/n = −41.17 ± (4.17)(70.43) 1/6 = −41.17 ± 119.90 = (−161.07, 78.73). The CI does contain zero and thus we fail to reject Ho . Or p Since |y2· − y3· | = 41.17 < q(α, k, v)s 1/n = 119.90, we fail to reject Ho and conclude there is not a significant difference between two means.
8...