Old exam 1 - Exam PDF

Title Old exam 1 - Exam
Course Stochastic Process
Institution Texas A&M University
Pages 10
File Size 2.4 MB
File Type PDF
Total Downloads 13
Total Views 151

Summary

Exam...


Description

STATISTICS 608 Linear Models -EXAM I March 2, 2017

Student’s Name: Student’s Email Address: INSTRUCTIONS FOR STUDENTS: 1. There are 10 pages including this cover page. 2. You have exactly 75 minutes to complete the exam. 3. Complete the exam on this form. 4. Pen is preferred as it usually scans more clearly, though blue pen, black pen, or pencil are allowed. 5. There may be more than one correct answer; choose the best answer. 6. You will not be penalized for submitting too much detail in your answers, but you may be penalized for not providing enough detail. 7. You may use one 8.5” X 11” sheet of notes and a calculator. (Clear memory on a TI-83/84 with 2nd +, 7, All, 1.) 8. At the end of the exam, leave your sheet of notes with your proctor along with the exam. 9. You may choose not to scan in the appendix if you made no notes on it. 10. Do not discuss or provide any information to any one concerning any of the questions on this exam or your solutions until I post the solutions next week. I attest that I spent no more than 75 minutes to complete the exam. I used only the materials described above. I did not receive assistance from anyone during the taking of this exam. Student’s Signature: INSTRUCTIONS FOR PROCTOR: Immediately after the student completes the exam scan it to a pdf file and have student upload to Webassign. I certify that: 1. The time at which the student started the exam was which the student completed the exam was

and the time at .

2. The student has followed the instructions listed above. 3. The exam was scanned in to a pdf and uploaded to Webassign in my presence. 4. The student has left the exam and sheet of notes with me, to be returned to the student no less than one week after the exam or shredded. Proctor’s Signature:

1

Part I: Multiple choice 1. Which of the following is true about our usual least squares estimator for the slope of a straight line relationship between x and y ? (a) (b) It has smaller sampling variability than any other unbiased statistic. (c) It has more bias than any other linear statistic. (d) It has more bias than any other low variance linear statistic. 2. A statistician is interested on predicting y = the number of likes on a Facebook page’s photo using an indicator variable iWeekend which takes the value 1 if the post is posted on the weekend, and 0 otherwise. The model of interest is j . What is the parameter β1 ? (a) The average number of likes on the weekend. (b) The average number of likes during the week (not on the weekend). (c)

. (Hint: plug in a 0 for iWeekend, and then a 1, to get your µ’s. Then subtract to get β1 .)

(d) The sum of the average number of likes on the weekend and during the week. 3. Two regression models yˆ = βˆ0 + βˆ1 x1 have the same SSModel (Regression SS) and the same values of xi , but Model 1 has a larger SSE than Model 2. Which of the following is true? (a) The sample slope is larger for Model 1 than Model 2. (b) The sample slope is larger for Model 2 than Model 1. (c) The value of R2 is larger for Model 1 than Model 2. (d) 4. Model 2 of the Facebook question in the last question is a model with a square root transformation. R output for a model with 17 shares gives a confidence interval of (9.91, 10.53) for the mean square root likes for that many shares. What is the appropriate confidence interval for the number of likes, assuming that model is valid? (a) (9.912 , 10.532 ) (b) (c) ((9.91 + MSE/2)2 , (10.53 + MSE/2)2 (d) ((9.91 + 1/MSE )2 , (10.53 + 1/MSE )2

2

Part II: Short answer 5. Define a linear regression model written in matrix notation as where Y and X are the (n × 1) random response vector and the (n × 2) design matrix, respectively. The first column of X is the 1 vector. Also define the (2 × 1) vector of unknown regression parameters as β = [β0 β1 ]′ . Finally, the (n × 1) vector e is the vector of random errors. Assume that the errors are independent and identically distributed. ˆ is equal to: (a) A student has written that the least squares estimator for the parameters, β, βˆ = (X′ X)−1 X′ y = X−1 (X′ )−1 X′ y = X−1 y. What is the student’s mistake here? ; is not even square. Note that (AB)−1 = B −1 A−1 if those inverses exist, so that’s not the problem.

(b) Show that the variance of the least squares estimator βˆ = (X′ X)−1 X′ y is σ2 (X′ X)−1 .

  Var (X′ X)−1 X′ y X) = (X′ X)−1 X′ Var(y|X)X(X′ X)−1 = (X′ X)−1 X′ Var(Xβ + e|X)X(X′ X)−1 = (X′ X)−1 X′ Var(e|X)X(X′ X)−1 = (X′ X)−1 X′ σ2 IX(X′ X)−1 = σ2 (X′ X)−1 X′ X(X′ X)−1 = σ2 (X′ X)−1

3

(c) In your proof above, you likely used the fact that Var(e|X) = σ2 I. Why is this true? Explain, as if to a statistics student who hasn’t had our class yet. The variance-covariance matrix of a vector contains the variances of each element on the diagonals of the matrix, and the covariance between elements on the off-diagonals. Because our errors are independent, we know that the off-diagonals of this covariance matrix will be 0, and because the errors are identically distributed, they have the same variance. We’ll call this variance σ2 , and that single number will be in every on-diagonal of the variance-covariance matrix. Since it’s the same number for every diagonal, we can rewrite that matrix as σ2 I.

6. Show that the variance stabilizing transformation for a response variable Y with mean µ and variance e µ is f (Y ) = e −y /2 . The variance of the transformed Y is not equal to 1; does that matter? Why or why not? By the Taylor series expansion:

−1 −y /2 e 2 −1 −µ/2 e f ′ (µ) = 2 f ′ (Y ) =

By the Taylor series expansion:

 ′ 2 f (µ) Var(Y ) 2  −1 −µ/2 e eµ = 2 1 −2µ/2 µ e e = 4 1 = 4

Var(f (Y )) ≈

No, it doesn’t matter that the variance is not 1; it only matters that the variance is now constant with respect to the mean. Thus when we model the mean of Y using regression methods, we should have constant variance even though the mean of Y will change as our explanatory variable changes.

4

7. A arborist is interested in the relationship between and . The model of interest is the no-intercept model yi = βxi + ei , where the errors are assumed to be independent but have variance proportional to the tree circumference; that is, Var(ei |xi ) = σ2 xi . The four trees measured had circumferences of 4, 4, 5, and 5. Set up a weighted least squares estimator as follows: (a) First, describe the parameter β in words, as if to someone with no experience in statistics. As the tree circumference increases by 1 unit, the height of the tree is expected to increase by β units, on average. (b) Define the design matrix for this model.

  4  4  X=  5 5

(c) Define the

for this model.

  1/4 0 0 0  0 1/ 4 0 0   W=  0 0 1/ 5 0  0 0 0 1/ 5 (d) Finally, calculate the least squares estimator for the parameter β .

  1/4 0 0 0    0 1/ 4 0 0   X′ W = 4 4 5 5   0 0 1/ 5 0  0 0 0 1/ 5   = 1 1 1 1

βˆ = (X′ WX)−1 (X′ Wy)       −1 4 y1    y2   4      1 1 1 1  =   1 1 1 1 5   y3  y4 5 X 1 = yi 18

5

Part III: Long answer 8. A statistician working for a cosmetics company is interested in modeling the relationship between the number of likes and the number of shares on their Facebook page. Plots for the following three models can be found in the appendix. A (terrible) preliminary model for the relationship between likes and shares is the simple linear regression model below.

Model 1: l i kesi = β0 + β1 s har esi + ei (a) One method for choosing a transformation for the variables is to use ; what transformation would you suggest for transforming the number of shares based on the output below? Explain. yjPower Transformation to Normality Est.Power Std.Err. Wald Lower Bound Wald Upper Bound share 0.1555 0.0312 0.0942 0.2167 Likelihood ratio tests about transformation parameters LRT df pval LR test, lambda = (0) 24.19038 1 8.726714e-07 LR test, lambda = (1) 878.74071 1 0.000000e+00 Values for λ between 0.0942 and 0.2167 are all within the confidence interval; . Some people argued for a log transformation or square root transformation since they were “close” to the endpoints of the confidence interval, but do notice that the log transformation has a very small p-value (0.00000087), meaning we should definitely reject the log transformation, and the square root transformation (λ = 0.5) is even further away from the upper bound than 0 is from the lower bound. (b) Regardless of the transformation chosen above, the statistician has decided to consider a square root transformation and a log transformation, as shown below. The estimate for β1 in Model 3 is 1.04. Interpret this number in the context of this problem. (Hint: no, the fact that we added one before taking the log doesn’t matter.)

Model 2: Model 3:

√ √ l i kesi = β0 + β1 s har esi + ei log(l i kesi + 1) = β0 + β1 log(shar esi + 1) + ei

If the number of shares for a Facebook post increases by number of likes will increase by , .

6

our model predicts that the

(c) Plots of the two models below are shown in the appendix. First consider Cook’s distance.

? Explain. is the point that has the highest leverage of the three points - it has an x-value very far from the mean of the x-values. However, Cook’s D is even higher for point A than for point B; . We call ” (it is the point the farthest to the right on the two plots with fitted values and on the leverage plot). has a high Cook’s D because it has both higher leverage and higher standardized residual: its x-value is equal to the min (in case you’re curious, the original number of shares was 0; log(0 + 1) = 0), and its number of likes was more in line with the number of likes for average x-values. Why did that point have so many likes yet so few shares? .

(d) Now if you had to ignore Cook’s distance for a moment, which model is preferable? Explain. : . It looks like the relationship in the square root model should have more curvature to it (especially if we ignore the outlier). • •

the variance increases substantially across the fitted values in Model 2 (except at the outlier). ; since those vectors are orthogonal (covariance = 0, and thus independent if normal), we shouldn’t see a pattern in their plot. The plot for Model 2 seems to have increasing variability in yˆ.

• While we’re less concerned about normality due to the large sample size, it is true that you need a larger sample size to overcome stronger skew via the CLT. (Also note that normality only matters for inference; our estimates for the parameters are BLUE no matter what the distribution of the errors is!) .

7

Appendix

0

0

200

Likes

600

Cook's Distance 2 4 6 8

Model 1: l i kesi = β0 + β1 s har esi + ei

Residuals vs Fitted

Residuals -500 500

380 324 102

Standardized residuals 0.0 1.0 2.0

0

2000 4000 Fitted values Scale-Location 380 324

102

0

2000 4000 Fitted values

0

Standardized residuals -2 2 6

2000 4000 Shares

Standardized residuals -4 0 4 8

0

8

200

400 600 Shares

800

Normal Q-Q 380 324 102

-3 -1 0 1 2 3 Theoretical Quantiles Residuals vs Leverage 380 350

1 0.5 0.5 1 245

Cook's distance

0.0

0.2 0.4 Leverage

0.6

Cook's Distance 0.00 0.10 0.20

√ √ l i kesi = β0 + β1 s har esi + ei

0

sqrt(like) 20 40 60

Model 2:

Residuals -10 0 10 20

Residuals vs Fitted 324 380 18

Standardized residuals 0.0 1.0 2.0

0

20 40 60 Fitted values Scale-Location 324 380 18

0

20 40 60 Fitted values

Standardized residuals -2 0 2 4 6

5 10 20 sqrt(share)

Standardized residuals -2 0 2 4 6

0

A

B

0

Normal Q-Q 324 380 18

-3 -1 0 1 2 3 Theoretical Quantiles Residuals vs Leverage 380 102 350

0.00

9

5 10 20 Sqrt Shares

1 0.5

Cook's distance

0.10 Leverage

0.5 1

Cook's Distance 0.00 0.10 0.20 0.30

log(l i kesi ) = β0 + β1 log(s har esi ) + ei

0

log(like + 1) 2 4 6

8

Model 3:

Residuals -2 0 2

437

279

123

Standardized residuals 0.0 1.0 2.0

2 3 4 5 6 7 8 Fitted values Scale-Location 437

123 279

2 3 4 5 6 7 8 Fitted values

Standardized residuals -4 0 4

Residuals vs Fitted

Standardized residuals -2 2 4 6

0 1 2 3 4 5 6 log(share + 1)

C

0 1 2 3 4 5 6 Log (Shares + 1)

Normal Q-Q 437

279

123

-3 -1 0 1 2 3 Theoretical Quantiles

0.000

10

Residuals vs Leverage 437 0.5

77 22

Cook's distance

0.015 Leverage

0.030...


Similar Free PDFs