Chapter 5 Testing General Linear Hypothesis Part 1 notes PDF

Title	Chapter 5 Testing General Linear Hypothesis Part 1 notes
Author	Bixuan Yang
Course	Regression Analysis
Institution	National University of Singapore
Pages	19
File Size	1.4 MB
File Type	PDF
Total Downloads	90
Total Views	135

Preview

CLICK TO PREVIEW PDF

Summary

lecture 5...

Description

1

In Chapter 2, we study the test of hypothesis that 𝛽 = 0, 𝛽 = 0, ⋯ , 𝛽 = 0. In Chapter 3, we study the test of hypothesis that 𝛽 = 0 for some 𝑘 . We would like to test a more general type of hypothesis that 𝑐𝛽 + 𝑐 𝛽 + ⋯ + 𝑐 𝛽 = 0 for 𝑖 = 1, ⋯ , 𝑎. The system of equations in H0 can be expressed as 𝐶𝛽 = 0. Full model: 𝐸 𝑦 = 𝑋𝛽 or 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ + 𝛽 𝑥 unless otherwise stated. It includes all the predictors that we are interested. Making use of the information (or equations) given in H0, we can replace some of the parameters by the other parameters and hence reduced the number of 𝛽’s in the model. Reduced model: 𝐸 𝑦 = 𝑍𝛼 or 𝐸 𝑦 = 𝛼 + 𝛼 𝑧 + 𝛼 𝑧 + ⋯ + 𝛼 𝑧 . 𝑆𝑆𝐸 = Error Sum of Squares for the full model 𝑆𝑆𝐸 = Error Sum of Squares for the reduced model 𝑆𝑆𝐸 − 𝑆𝑆𝐸 = Sum of Squares due to the null hypothesis (or difference in the unexplained variations between the reduced model and the full model. If 𝑆𝑆𝐸 − 𝑆𝑆𝐸 is big, then there is a significant difference in terms of unexplained variation for between the reduced model and the full model. Hence, we reject H0 and prefer the full model.   /𝑞 Test statistic: 𝐹 = /  , where 𝑞 is the difference between the degrees of freedom for 𝑆𝑆𝐸 and 𝑆𝑆𝐸 .

2

The null hypothesis 𝛽 − 𝛽 = 0 is a linear hypothesis as it can be expressed in the form of 𝑐 𝛽 + 𝑐 𝛽 = 0, where 𝑐 = 1 and 𝑐 = −1. Full model : a model with 2 predictors, 𝑥1 and 𝑥2, and an intercept term 𝛽 Or full model : 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 Number of parameters (𝛽’s) = 3 The null hypothesis H0, 𝛽 − 𝛽 = 0 is a linear hypothesis as it can be expressed in the form of 𝑐 𝛽 + 𝑐 𝛽 = 0, where 𝑐 = 1 and 𝑐 = −1. Making use of the equation under H0, the number of parameters reduced to 2. Which 2 parameters? Reduced model : 𝐸 𝑦 = 𝛽 + 𝛽 (𝑥 + 𝑥 ) Or 𝐸 𝑦 = 𝛽 + 𝛽 (𝑥 + 𝑥 )

3

A linear function in 𝛽’s means that it is in the form of ∑ 𝑐 𝛽. The example gives a system of 3 equations involving 3 linear function of 𝛽’s Take note that some of these coefficients 𝑐 can be 0.

4

Linearly independent equations mean that any one of the equations cannot be expressed as a linear combination of other equations. If the equations are not linearly independent, then we can eliminate some of these equations and hence the number of equations in the system can be reduced. In Example 1, can any of the equations can be expressed in terms of a linear combination of the rest of the equations. For example, can 𝛽 = 0 be expressed as a linear combination of equations 𝛽 = 0, 𝛽 = 0, ⋯ , 𝛽 = 0. Why are linearly independent equations important? It is because linear dependent equations have some redundant equations which can be removed.

5

By making use of the equations under the null hypothesis, H0, we replace 𝛽 by 0, 𝛽 by 0 and so on and up to 𝛽 by 0. Reduced model: 𝐸 𝑦 = 𝛽 + 0𝑥 + 0𝑥 + ⋯ + 0𝑥 = 𝛽 The number of 𝛽’s in the full model = 𝑝 + 1 (i.e. 𝛽 , 𝛽 , ⋯ , 𝛽 ) The number of 𝛽’s in the reduced model = 1 (i.e. 𝛽 )

6

There are 𝑝 − 1 equations under H0 Are these 𝑝 − 1 equations linearly independent? Yes, it is because none of the equations can be expressed as a linear combination of the rest of the equations. For example, 𝛽 − 𝛽 = 0 cannot be expressed as a linear combination of the rest of 𝑝 − 2 equations. The 𝑝 − 1 equations are equivalent to 𝛽 = 𝛽 = 𝛽 = ⋯ = 𝛽 Reduced model: 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ + 𝛽 𝑥 Or 𝐸 𝑦 = 𝛽 + 𝛽 (𝑥 + 𝑥 + ⋯ + 𝑥 ) The number of parameters in the full model = 𝑝 + 1 (i.e. 𝛽 , 𝛽 , ⋯ , 𝛽 ) The number of parameters in the reduced model = 2 (i.e. 𝛽 , 𝛽 ) Example of linear dependent equations: Equation 1: 𝛽 − 𝛽 = 0 Equation 2: 𝛽 − 𝛽 = 0 Equation 3: 𝛽 − 𝛽 = 0 Note: Eq(1) + Eq(2) – Eq(3) = 0. In other words, Equation 3 can be expressed as the sum of Equations (1) and (2). Hence, we can remove Equation 3 and the number of equations reduced to 2 and Equations (1) and (2) are linearly independent as we cannot reduced further.

7

There are m equations in H0. We can use the matrix form to represent the system of m equations.

8

We assume that the 𝑚 equations of the 𝑝 + 1 𝛽’s are linearly dependent. In other words, some of the m equations may be expressed as a linear combination of other equations. Consider a set of 3 equations with 5 𝛽’s. They are 𝛽 , 𝛽 , 𝛽 , 𝛽 and 𝛽 Equation 1: 𝛽 + 3𝛽 − 3𝛽 = 0 Equation 2: 𝛽 + 𝛽 = 0 Equation 3: 𝛽 − 1.5𝛽 = 0 Note that 𝛽 and 𝛽 do not show up in the above system of equations because the corresponding coefficients, 𝑐 and 𝑐 are 0 in all 3 equations. As 𝛽 and 𝛽 are not included in the equations in H0, we cannot replace these 2 parameters by other parameters, hence the terms 𝛽 𝑥 and 𝛽 𝑥 are still left in the reduced model

9

Note that Eq (3) = ½ (Eq(1) – Eq(2)) After eliminating the third equation, the remaining 2 equations are linearly independent In this example, the number of parameters in the full model is 𝑟 = 𝑝 + 1 = 5 The number of equations given in H0, is 𝑚 = 3. and the number of linearly independent equations, 𝑞 = 2. In other words, we only need any of the 2 of the 3 equations. The remaining equation is redundant and gives no additional information about the 𝛽’s. Notice that the number of linearly independent equations, 𝑞 , must be less or equal to the number of equations, 𝑚. The 2 linearly independent equations are 𝛽 = −𝛽 and 𝛽 = 1.5𝛽 In Example 1, 𝑚 = 𝑝, and 𝑞 = 𝑝 In Example 2, 𝑚 = 𝑝 − 1 and 𝑞 = 𝑝 − 1

10

We assume that the 𝑚 equations of the 𝑟 𝛽’s are linearly dependent. Without loss of generality, we assume that the last 𝑚 − 𝑞 of the equations depend upon the first 𝑞 linearly independent equations Hence we can throw away the last 𝑚 − 𝑞 equations. Of course throwing away equations is easy. However, the hard part is to determine 𝑞 , the number of linearly independent equations under H0. If we are not sure what 𝑞 is, we just remove one linear dependent equation at a time until we cannot remove any of the remaining equations

11

Full model: 𝐸 𝑦 = 𝑋𝛽 , where 𝑦: 𝑛 × 1, 𝑋 : 𝑛 × 𝑟, 𝛽: 𝑟 × 1. If there is a constant term 𝜷𝟎 in the model with 𝑝 predictors 𝑥 , ⋯ , 𝑥 , then 𝑟 = 𝑝 + 1. SSE for the full model is given by 𝑦′𝑦 − 𝛽󰆹󰆒 𝑋′𝑦 with 𝑛 − 𝑟 d.f.

12

4 parameters: 𝛽1, 𝛽2, 𝛽3 and 𝛽4 (i.e. 𝑟 = 4) Full model: 𝐸(𝑦) = 𝛽1𝑥1 + 𝛽2𝑥2 + 𝛽3𝑥3 + 𝛽4𝑥4 H0: 𝛽1 + 𝛽2 + 𝛽3 + 𝛽4 = 0 against H1: H0 is false (i.e. 𝑞 = 1) Using the equation in H0, we have 𝛽 = −(𝛽1 + 𝛽2 + 𝛽3). Substitute 𝛽 = −(𝛽1 + 𝛽2 + 𝛽3) into the full model, we have Reduced model: 𝐸(𝑦) = 𝛽1𝑥1 + 𝛽2𝑥2 + 𝛽3𝑥3 − (𝛽1 + 𝛽2 + 𝛽3)𝑥4 Or 𝐸 𝑦 = 𝛽1 𝑥1 − 𝑥4 + 𝛽2 𝑥2 − 𝑥4 + 𝛽3(𝑥3 − 𝑥4) Or 𝐸 𝑦 = 𝛼1𝑧1 + 𝛼2𝑧2 + 𝛼3𝑧3, where 𝛼1 = 𝛽1, 𝛼2 = 𝛽2, 𝛼3 = 𝛽3, 𝑧1 = (𝑥1 − 𝑥4), 𝑧2 = (𝑥2 − 𝑥4) and 𝑧3 = (𝑥3 − 𝑥4). There are 3 ( = 𝑟 − 𝑞 = 4 − 1) parameters in the reduced model.

13

Let us look at Example 3a again. There are 5 parameters 𝛽 , 𝛽 , 𝛽 , 𝛽 and 𝛽 in the full model. There are 2 linearly independent equations under H0. We eliminate Equation 1 since it is the sum of Equation 2 and 2 times Equation 3. From Equation 3, we have β1 = 1.5β2 From Equation 2, we have β0 = −β1. Hence β0 = −1.5β2 Note that the reduced model does not have a constant term. Hence we can express β0 and β1 in terms of β2

14

Full model: 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 By substituting 𝛽 by −1.5𝛽 and 𝛽 by 1.5𝛽 , then the full model reduced to 𝐸(𝑦) = −1.5𝛽2 + 1.5𝛽2𝑥1 + 𝛽2𝑥2 + 𝛽3𝑥3 + 𝛽4𝑥4 Or 𝐸(𝑦) = 𝛽2(−1.5 + 1.5𝑥1 + 𝑥2) + 𝛽3𝑥3 + 𝛽4𝑥4 with 3 parameters 𝛽 , 𝛽 and 𝛽 Note that the reduced model does not have a constant term.

15

Reduced model: 𝐸 𝑦 = 𝑍𝛼 , where 𝑍 is a 𝑛 × (𝑟 − 𝑞) matrix and 𝛼 is a 𝑟 − 𝑞 × 1 vector. Sum of Squares Error for the reduced model is given by 𝑆𝑆𝐸 = 𝑦 󰆒 𝑦 − 𝛼 󰆒 𝑍′𝑦 with 𝑛 − (𝑟 − 𝑞) d.f., where 𝑞 is the number of parameters eliminated. The largest rectangle represents the total sum of squares (SST). The blue rectangle with part of it covered by the purple rectangle represents the error sum of squares for the reduced model. The purple rectangle represents the error sum of squares for the full model. Notice that the SSEH ≥ SSE. Hence the purple rectangle is part of the blue rectangle. The difference between SSEH and SSE is represented by the shaded blue rectangle. The difference, SSEH – SSE is called the sum of squares due to the hypothesis 𝐶𝛽 = 0. Notice that if the difference is small, then it means that SSEH is not very different from SSE. In that case, there is not much difference between the full model and the reduced

16

model. Hence we may consider using the reduced model because it is a simpler model.

16

Note that SSEH ≥ SSE since the number of parameters in the reduced model is less than that in the original or full model In general, we expect a model with more parameters explains a larger part of the total sum of squares than a model with less parameters The degrees of freedom for SSEH is (𝑛 − 𝑟 − 𝑞 ) and the degrees of freedom for SSE is (𝑛 − 𝑟) Therefore the degrees of freedom for the sum of squares due to the hypothesis that 𝐶𝛽 = 0 equals 𝑞 If SSE_H is not much different from SSE, then it implies that there is not much difference between the reduced model and the full model in terms of explaining the variation in 𝑦 in that case, we may prefer the reduced model as it is a simpler model and it is not much different from the full model in explaining the variation in 𝑦. The question is how large is the sum of squares due to the hypothesis that 𝐶𝛽 = 0 considered to be large? Hence we need to derive a test statistic so that it will be large when the null hypothesis, H0 is false. We also have to derive the distribution of this test statistic.

17

We use the test statistic F which is the ratio of 2 quantities. The numerator of the test statistic F is the sum of squares due to the hypothesis (i.e. 𝑆𝑆𝐸𝐻 – 𝑆𝑆𝐸 ) divided by the corresponding degrees of freedom 𝑞 , where 𝑞 is the number of linearly independent equations under the null hypothesis, H0. The denominator of the test statistic is the error sum of squares under the full model, (i.e. 𝑆𝑆𝐸 ) divided by the corresponding degrees of freedom (𝑛 − 𝑟), where 𝑟 is the number of parameters in the full model. It can be shown that under the null hypothesis, H0, the test statistic F follows an F distribution with degrees of freedom 𝑞 and (𝑛 − 𝑟) . We reject the null hypothesis at α level of significance if the observed F value is larger than the critical value 𝐹 (𝑞, 𝑛 − 𝑟).

18...