Spring 2008 Final Exam Solutions PDF

Title	Spring 2008 Final Exam Solutions
Course	Basic Statistical Meth
Institution	Georgia Institute of Technology
Pages	8
File Size	173.3 KB
File Type	PDF
Total Downloads	37
Total Views	142

Preview

CLICK TO PREVIEW PDF

Summary

Spring 2008 Final Exam Solution...

Description

2028B: Basic Statistical Methods (Spring 2008) (Version A) Final Exam-Solution

NAME:

GTid #:

HONOR CODE STATEMENT

I pledge on my honor that: I have completed all steps of the exam on my own, I have not used any unauthorized materials while completing this exam, and I have not given anyone else access to my exam.

Signature and date

This test is 170 minutes long. You are allowed THREE cheat sheets including the cheat sheet you used in the first two tests. Do not look at or start the test until you are told to do so. When we ask you to return the test, stop immediately, hand the test in, and do not utter a word to anyone. Please show your answers on this sheet. GOOD LUCK!

1

1. [12 points = 6×2 pts] Fill in the blanks: (a) Analysis of

is abbreviated as ANOVA. [VARIANCE]

(b) The least-squares method MIZES]

the sum of the squares of the residuals. [MINI-

(c) Three basic principles of a good design are (1) ing. [RANDOMIZATION] (d) Any linear function of the form w =

k P

i=1

ci wi where

µ2 − µ3 − µ4 = 0) is called a comparison or a [CONTRAST]

, (2) replication and (3) blockk P

i=1

ci = 0 (for example, µ1 + in the treatment means.

(e) In the One-Way ANOVA, ‘Ho : µ1 = µ2 = · · · = µk vs HA : at least two of means are not equal’ can be written as ‘Ho : α1 = α2 = · · · = αk = 0 vs HA : . [AT LEAST ONE OF THE αi ’s IS NOT ZERO] (f)

SSE/(n − k − 1) in the multiple linear regression model SST/(n − 1) with k independent variables. [ADJUSTED] R2 is defined as 1 −

2. [12 points = 6×2 pts] True or False? (a) r 2 = R2 is always true for a multiple linear regression. [ ] [FALSE, it’s true only for a simple linear regression] SSR ∼ χ2 (1). [ ] [TRUE] (b) In a simple linear regression, σ2 (c) The quantity t2 with k degrees of freedom equals to the quantity f with 1 and k degrees of freedom. [ ] [TRUE] (d) R2 always decreases if we add more regressors to the model. [ DECREASES IF...]

] [FALSE, IT

(e) If we fail to reject ‘Ho : β1 = β2 = · · · = βk = 0 in a multiple linear regression, then y should be constant. [ ] [TRUE] (f) We can use the One-Way ANOVA method to compare means from k ≥ 2 populations. [ ] [TRUE] 3. [30 points] Mercury pollution is a serious problem in some waterways. Mercury levels often increase after a lake is flooded due to leaching of naturally occurring mercury by the higher levels of the water. Excessive consumption of mercury is well known to be deleterious to human health. It is difficult and time consuming to measure every persons mercury level; a quick procedure would be nice that could be used to estimate the mercury level of a person based upon the average mercury level found in fish and estimates of the person’s consumption of fish. The data was collected on the methyl mercury intake of 13 subjects and the actual mercury levels recorded in the blood stream from a random sample of people around recently flooded lakes. 2

(a) [8 points: 1pt/each] A R session produced the following incomplete output. Based on the following output, complete the ANOVA table below. Call: lm(formula = Level ~ Intake) Coefficients: Estimate (Intercept) 50.44 Intake 0.4529

Std. Error 47.77 0.1154

t value 1.06 3.92

Pr(>|t|) 0.314 0.002

Residual standard error: 84.4474 on 4 degrees of freedom Multiple R-squared: 0.583, Adjusted R-squared: 0.546 Analysis of Variance Df Sum Sq Mean Sq Regression __ 109863 ______ Residual Error __ ______ ______ Total __ ______

F value Pr(>F) ______ 0.002

Solution: Regression

Df 1

Sum Sq 109863

Residual Error

11

Total

12

78581 =188444-109863 188444 =109863/0.583

Mean Sq F value 109863 15.3789 =109863/1 =109863/7143.73 7143.73 =78581/11

Pr(>F) 0.002

(b) [8 points] Do all tests in this problem using α = 0.05 based on the output above. Make comments. Solution: This output provides p−values for two hypothesis tests: (1) Ho : β0 = 0 vs HA : β0 6= 0 and (2) Ho : β1 = 0 vs HA : β1 6= 0. Since p−value for the test (1) is 0.314, we fail to reject Ho . However, we reject Ho in (2) because p−value is 0.002 < 0.05. (c) [4 points] Test the hypothesis Ho : β1 = 0 vs HA : β1 = 6 0. Use α = 0.10. Solution: Since the computed t−statistic is 3.92 and t(0.10, 11) = t0.10 (11) = 1.363, we reject Ho . (d) [4 points] Test the hypothesis Ho : β0 = 100 vs HA : β0 < 100. Use α = 0.05. b0 − 100 50.44 − 100 = −1.0375 > −t(0.05, 11) = −1.796, Solution: Since t = = 47.77 s.e(b0 ) we fail to reject Ho . (e) [6 points] Assume that we want to predict what is the level for intake = 200. The following table has all the information. Instead of supplied 95% confidence/prediction 3

intervals, find both 99% confidence and prediction intervals (for the same value of intake). Predicted Values New Intake Fit StDev Fit 95.0% CI 200 141.0 29.9 (75.3, 206.8)

95.0% PI (-56.1, 338.2)

q 2 Solution: Note that CI for yb0 is yb0 ± tα/2 (n − 2)s 1n + (x0 −(x)) . Since 95% Sxx q 2 CI is (75.3, 206.8), 2 × t0.025 (11)s n1 + (x0 −(x)) = 131.5 and t0.025 (11) = 2.201, Sxx q 2 s n1 + (x0 −(x)) = 29.8728. Therefore, 99% CI is 141.0 ± 3.106 × 29.8728 = 141.0 ± Sxx 92.7849 = (48.2151, 233.7849). The 99% PI is (−137.213, 419.213). 4. [6 points] Compute the correlation coefficient between x and y. Are x and y positively related?

n

xi

yi

x2i

yi2

xi yi

1

75

4

5625

16

300

2

83

7

6889

49

581

3

67

2

4489

4

134

4

89

9

7921

81

801

5

95

12

9025

144

1140

6

79

6

6241

36

474

7

81

6

6561

36

486

sum

569

46

46751

366

3916

Solution: Since Sxx =

7 P

i=1

x2i − (

7 P

i=1

xi )2 /7 = 46751 − (5692 )/6 = 499.43, Syy =

7 P

i=1

y2i −

7 7 7 7 P P P P ( yi )2 /7 = 366 − (462 )/6 = 63.71 and Sxy = xi yi − ( xi )( yi )/7 = 3916 − i=1

i=1

i=1

i=1

176.86 Sxy =p (569)(46)/6 = 176.86, r = p = 0.9915. Since r = 0.9915 > 0, Sxx Syy (499.43)(63.71) they’re positively correlated. 5. [10 points] A set of experimental runs was made to determine a way of predicting cooking time y at various levels of oven width x1 , and flue temperature x2 . The coded data were recorded as follows: 4

y

x1

x2

6.40

1.32

1.15

15.50

2.69

3.40

18.75

3.56

4.10

30.25

4.41

8.75

44.85

5.35

14.82

The equation yi = β0 + β1 x1i + β2 x2i + ǫi can be written y = Xβ + ǫ where y, X, β and ǫ are matrices. (a) [6 points: 4pts+2pts] Write the matrices X and   1 1.32 1.15       1 2.69 3.40  β  0      Solution: X =  1 3.56 4.10  and β =  β1       1 4.41 8.75  β2   1 5.35 14.82

β. 

  . 

(b) [4 points] Write the form of SSE in terms of matrices above. Solution: SSE = (y − Xβ)t (y − Xβ) or SSE = ky − Xβk22 6. [30 points] Suppose in an industrial experiment that an engineer is interested in how the mean absorption of moisture in concrete varies among 5 different concrete aggregates. The samples are exposed to moisture for 48 hours. It is decided that 6 samples are to be tested for each aggregate, requiring a total of 30 samples to be tested. Aggregate

1

2

3

4

5

551

595

639

417

563

457

580

615

449

631

450

508

511

517

522

731

583

573

438

613

499

633

648

415

656

632

517

677

555

679

(a) [6 points: 2pts/each] Write the definition of SST, SSA and SSE . SST

=

n k X X

(yij − y·· )2

i=1 j=1

5

SSA =

k X n X

(yi· − y·· )2 = n

k X n X

(yij − yi· )2

i=1 j=1

SSE

=

k X

(yi· − y·· )2

i=1

i=1 j=1

(b) [7 points: 1pt/each] Complete the ANOVA table below. Analysis of Variance Table Df Treatment ___ Residuals ___ Total ___

Sum Sq Mean Sq F value ______ 21339 _______ 124020 _____ ______

Analysis of Variance Table Df Treatment 4 Residuals 25 Total 29

Sum Sq Mean Sq F value 85356=4*21339 21339 4.3015 124020 4960.8=124020/25 209376

(c) [6 points] Test the hypothesis for the means at the 0.05 level of significance. Write your decision and conclusion. Solution: i. Ho : µ1 = · · · = µ5 vs HA : at least two of means are not equal. ii. α = 0.05 iii. Test statistic: F ∼ (4, 25) iv. Computed f−statistic: 4.3015 v. Since f(0.05, 4, 25) = 2.76 < f, we reject Ho and conclude that at least two of means are not equal. (d) [7 points] Perform all possible (different) paired comparisons of the five treatments based on the box-plots below. And then verify your answers based on the R output.

6

Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = agg.trt ~ trt) $trt 2-1 3-1 4-1 5-1 3-2

diff lwr upr p adj 16.0000000 -103.42650 135.42650 0.9946026 57.1666667 -62.25984 176.59317 0.6297485 -88.1666667 -207.59317 31.25984 0.2243248 57.3333333 -62.09317 176.75984 0.6272414 41.1666667 -78.25984 160.59317 0.8472695 7

4-2 -104.1666667 -223.59317 15.25984 0.1088202 5-2 41.3333333 -78.09317 160.75984 0.8453941 4-3 -145.3333333 -264.75984 -25.90683 0.0116387 5-3 0.1666667 -119.25984 119.59317 1.0000000 5-4 145.5000000 26.07350 264.92650 0.0115253

Solution: There are totally 10 =

5(5 − 1) paired comparisons. 2

paired comparisons µ1 vs µ2 , µ2 vs µ3 , µ2 vs µ5 µ1 vs µ3 , µ1 vs µ4 , µ1 vs µ5 , µ2 vs µ4 µ3 vs µ4 , µ4 vs µ5 µ3 vs µ5

comments little different different very different same

From the R output on Tukey’s Multiple Comparisons of means, we can say: i. Since p−value for the test Ho : µ3 = µ5 is 1.0000, we fail to reject Ho and conclude that those means are same. ii. Since p−values for the tests Ho : µ4 = µi , i = 3 or 5, are less than 0.05, we reject Ho . (e) [4 points] Do Tukey’s Test for aggregates 2 and 3 at α = 0.05. Write your work in details. Do not use the R output. Note y2· = 569.33 and y3· = 610.50. Solution: Since p−value from R output is 0.8473, we fail to reject Ho : µ2 = µ3 . Or Since |y2· − y3· | = 41.17 p , q(0.05, 5, 25) ≈ 4.17, s = 70.43 p and n = 6, the 95% IC is (y2· − y3· ) ± q(α, k, v )s 1/n = −41.17 ± (4.17)(70.43) 1/6 = −41.17 ± 119.90 = (−161.07, 78.73). The CI does contain zero and thus we fail to reject Ho . Or p Since |y2· − y3· | = 41.17 < q(α, k, v)s 1/n = 119.90, we fail to reject Ho and conclude there is not a significant difference between two means.

8...