Facit - Övning 3 PDF

Title	Facit - Övning 3
Course	Regressionsanalys
Institution	Kungliga Tekniska Högskolan
Pages	6
File Size	203.3 KB
File Type	PDF
Total Downloads	55
Total Views	125

Preview

CLICK TO PREVIEW PDF

Summary

Facit till Övning 3...

Description

SF2930 Regression analysis Solutions for Exercise session 3 - Ch 4: Model adequasi, Ch 5: Transformations and weighting to correct model inadequacies, Ch 15.4: re-sampling (bootstrap) Major assumptions for the regression model Since we will be looking at model inadequacies and how to fix them it is useful to start with going over the models major assumptions. 1. The relationship between the response y and the regressor is linear, at least approximately 2. The error ǫ has mean zero 3. The error ǫ has constant variance 4. The errors are uncorrelated 5. The errors are normally distributed. Note that this is not needed to fit the model with least squares but is required for the hypothesis testing. Problem 1 is about how we can fix problems related to assumption 3. Problem 2 is about assumption 3 and 4 when the errors are correlated and have variance σ 2 V and V is known. Problem 3 and 4 is about checking the model and the assumptions by using plots the residuals.

QQ plots QQ plots can be interpreted to yield various properties of a distribution. Note that randomness can make the plots very unclear. See Figure 1 and Figure 2. Compared to a standard normal distribution properties tell us that: light tailed too little mass in tails heavy tailed too much mass in tails left/negative skew mass towards right, long tail towards left right/postive skew mass towards left, long tail towards right

1

Figure 1: QQ plot interpretation

Figure 2: QQ plot interpretation

2

Other plots There are lots of possible plots one can make to check model assumptions. Here are some of them. There are more variants found in the course literature. • ei vs ei−1 Check correlation between residuals • Residuals vs potential new variable to include If it looks linear including the variable might improve the model • Residuals vs time of datapoint or other “meta” information Can reveal bias • y vs time of datapoint or other “meta” information Can reveal bias Different plots with residuals on x-axis, can be either standardized or studentized residuals, with y axis as: • y ˆ used to check normal assumption • x similar to y ˆ but can show need for polynomial

Problem 1 The three models as given in the problem are: 1. y = β0 + β1 (1/x) + ǫ 2. 1/y = β0 + β1 x + ǫ 3. y = x/(β0 + β1 x) + ǫ The first model The three models are intrinsically/transformably linear, i.e., we can use a transformation to linearize them. The transformations are 1. y = β0 + β1 (1/x) + ǫ 2. 1/y = β0 + β1 x + ǫ 3. y = x/(β0 + β1 x) + ǫ See Figure 3 to 5 for the plots. They are made with x = 1 : 100, β 0 = 4, β 1 = 2, and ǫ = 0.

3

Figure 3: Model 1

Figure 4: Model 2

Figure 5: Model 3

4

Problem 2 This problem is related to generalized least squares, Montgomery et al. [2012] page 188-190. It is also connected with general linear hypothesis, which we used in problem 4 in the previous exercise session. The notation is slightly different because of the generalized least squares, compared with before for the general linear hypothesis uses T = K , c = m. We use some results from Rawlings et al. [2001] page 116-121. Note that the K used for the generalized least squares, KK = V, is not the same K as here, see equation 5 and in Rawlings et al. [2001] page 116-121. In this problem we do not have constant uncorrelated errors since V [ǫ] = σ 2 V

(1)

However, V is known. Let p2 be the number of parameters in β 2 . The hypothesis is H0 : β 2 = 0

H1 : β 2 6= 0

(2)

From generalized least squares, Montgomery et al. [2012] page 188-189, we have βˆ = (X′ V−1 X)−1 X′ V−1 y

(3)

Because we are interested in only a part of the coefficient, i.e., β 2 , we can use general line hypothesis, see Rawlings et al. [2001] page 116-121. Set   β0 (4) β= β1   0 (5) K= I where the size of I is so that rank(K) = p2 . m=0

(6)

This gives ′

K′βˆ = βˆ2

(7)

Which means that our hypothesis can be written as H0 : K′ β = m H1 : K′ β 6= m

(8)

Then from general linear hypothesis, Rawlings et al. [2001] page 116, 119-121, the test statistic is ˆ − m) ˆ − m)(K′ (X′ X)−1 K)(K′ β (K′ β F0 = (9) p2 MSE Under H0 F0 has central F distribution with degrees of freedom equal to the rank of K, which we know from how we set K is p2 . Under H1 F0 has noncentral F distribution with non-centrality parameter ˆ − m) ˆ − m)(K′ (X′ X)−1 K)(K′ β (K′ β 2σ 2 5

(10)

Problem 3 See R code separately.

Problem 4 This problem uses standardized and studentized residuals see Montgomery et al. [2012] page 74, 133-155. See R code separately.

Problem 5 See Montgomery et al. [2012] page 517 about bootstrap. See R code separately.

References D.C. Montgomery, E.A. Peck, and G.G. Vining. Introduction to Linear Regression Analysis. Wiley Series in Probability and Statistics. Wiley, 2012. ISBN 9780470542811. J.O. Rawlings, S.G. Pantula, and D.A. Dickey. Applied Regression Analysis: A Research Tool. Springer Texts in Statistics. Springer New York, 2001. ISBN 9780387984544.

6...