Qeiv Revision Booklet Copy PDF

Title Qeiv Revision Booklet Copy
Author Jake Walker
Course Quantiative Economics IV
Institution University of Nottingham
Pages 12
File Size 452.5 KB
File Type PDF
Total Downloads 108
Total Views 151

Summary

Download Qeiv Revision Booklet Copy PDF


Description

Quantitative Economics IV Revision Booklet Topic 1: Multiple Regression Analysis with Quantitative Information F Test

Where n = number of observations, k = number of independent variables in the unrestricted and q = number of restrictions For the critical value, Degrees of freedom, df1 is q (the amount of restrictions) Degrees of freedom, df2 is N-q-1

Reject H0 if critical value < F statistic

T-test T statistic =

^β (− β ) 1 H se( β^ 1 ) 0

where H0: β1 = x

Reject H0 if t statistic > critical value Degrees of freedom are n - k - 1 where k is the amount of variables Smaller the p-value, stronger the evidence against H0

Gauss Markov Assumptions MLR1: Linear in parameters In the population model, y is related to x by y = β0 + β1x + u, where (β0, β1) are population parameters and u is disturbance. This is not to say that variables can’t be linear such as β1x2. If not linear, then we have model misspecification. MLR2. Random sample (xi,yi),i = 1,2,··· ,n with n > 2 is a random sample drawn from the population model. If it is not a random sample, then the OLS estimator is said to be biased and inconsistent.

MLR3: No Perfect Collinearity (sample variation) none of the independent variables are constant, and there are no exact linear relationships among the independent variables such x 1 ≠ δ 0+ δ 1 x2 , e.g. 9 x 1=x 2 MLR4: Zero conditional mean. The disturbance μ satisfies E(μ | x1, x2, xk) = 0 for any given value of x. For the random sample, E(μi | xi ) = 0 for i = 1,2,··· ,n. This is to say knowing x i doesn’t help predict μi . This not holding true, means that we have endogenity. MLR5: Homoscedasticity The error μ has the same variance given any values of the explanatory variables. In other words, Var(μ | x 1, x2, xk) = σ2 . The variance doesn’t change as x changes. Heteroskedasticity, the inverse of this, says Var( μ i|x i ¿=f ( x i ) such that the variance of the errors are some function of x. MLR6: No correlation of errors. Knowing one error does not help you to predict another μ ¿ error, such that the covariance between any two errors is equal to 0 such Cov ¿ ¿ If this doesn’t stand true, then using a large sample we can say the Law of large numbers means OLS is consistent and the Central Limit Theorem means OLS is asymptotically normally distributed. Under MLR1 – MLR4, we know that the estimation is unbiased and consistent. Under MLR 1 – MLR5, we know the least squares estimation is BLUE (the Best Linear Unbiased Estimator) Under MLR 1 – MLR 6, we know that it is normally distributed. If MLR5: Homoscedasticity isn’t true and we have Heteroskedasticity then this means there is a better linear unbiased estimator, with lower variance of errors and thus more efficient. What test would you use to check if regressions are different for different sub groups such as whether it’s different for Male and Female? F Test or Chow Test How would you use the F Test? Include a dummy variable and all interaction terms with the dummy variable and other explanatory variables then use an F test such; Original Model: Rent=β 0 + β 1 Income + μ New Model for subgroups (Male/Female) Rent=β 0 + β 1 Income +δ 0 female+ δ 1 female∗ Income + μ

F test. H 0 : δ 0 =δ 1 =0

What is a Chow Test? The Chow test is a statistical and econometric test of whether the coefficients in two linear regressions on different data sets are equal.

What is the equation for a Chow test?

This is where SSRp is the initial regression with both male and female. SSR1and SSR2 are

the restricted regressions only including one gender. Either conduct an F- test of the dummy plus interactions being simultaneously equal to zero, or estimate separate models for the two groups and conduct F-test using the SSRs. H 0 :The equationis the same for both subgroups H A : The equation is not the same for bothsubgroups

Topic 2: Heteroskedasticity Heteroskedasticity OLS estimates are still unbiased and consistent OLS standard errors are biased OLS is no longer BLUE Example of heteroskedasticity Consider a regression of housing expenditures on income. Rent=β 0 + β 1 Income + μ Consumers with low values of income have little scope for varying their rent expenditures. Var( μ ) is low. Wealthy consumers can choose to spend a lot of money on rent, or to spend less, depending on tastes. Var( μ ) is high. White Test (as opposed to Alternate White Test) Allows for nonlinear relationship between the squared error and the independent variables. Taking nonlinearities into account (assuming k = 3): ^u2=δ 0 +δ 1 x 1 + δ 2 x 2+ δ3 x 3+ δ 4 x 21+δ 5 x 22+ δ6 x 23 + δ7 x1 x 2+δ8 x 1 x 3+ δ 9 x 2 x 3 +v H 0=δ 1= δ 2=…=δ 9=0

The null hypothesis is E ( μ2| x ¿=E ( μ2 )=σ 2 . Is the expected value of the squared error term related to one of the independent variables? Null hypothesis is homoscedastic errors. Alternate White Test 2 u y 2+ v ^ =δ0 +δ 1 ^y + δ2 ^

H 0=δ1= δ2=0

Breusch-Pagan Test Step 1: Estimate the model and compute Step 2: Estimate the model regressing

^u2

^u2 on all explanatory values such

^u2=δ 0 +δ 1 x 1 +δ 2 x 2+ μ

Step 3: Test H0: δ 1=δ 2=δ k =0 - Homoscedastic F Test or LM Test

What problems may arise with the White test or the BP test? They do not state the function of the variance and thus how to correct for it. How to correct for Heteroskedasticity? Use robust standard errors or feasible Generalised Least Squares (fGLS) Weighted Least Squares This determines the line based on what the error is and how the errors vary with x. However the problem with this method is that you need to know the functional form of the variance such Var( μ i| x i ¿=σ 2 h( x i ) where you have h ( x i ) . By “weighting” each variable in the equation by 1 we get an equation that looks like √ h (x)

This means the variance is as below with the error term weighted by the function of x and then using variance rules, we get back to homoscedasticity.

Feasible Least Squares

However it is very frequently the case that you don’t know the form of h ( x i ) and thus you have to use a feasible version of WLS such the “Feasible Least Squares” estimate. Step 1: Estimate the model and compute

Step 2: Estimate the model regressing fitted values, ^gi Step 3: Set

u^ log (¿ ¿ 2) ¿

u^ log (¿ ¿ 2) on all explanatory values and saving the ¿

^hi=exp( ^g i) and estimate the weighted model

However, it needs to be said that this is biased. However it is consistent (unbiased) when there is a large sample Also as N tends to infinity, it is a better estimator than OLS. The associated t-stat and F-stat have the usual t and F distribution for large n.

Topic 3: The Linear Probability Model Show the LPM suffers from Heteroskedasticity Var ( μ|x ¿=Var ( y| x ¿=P ( y=1| x ¿ [1−P ( y =1|x ) ¿

Var ( y|x ¿=E ( ( y−E ( y|x ) )|x ) 2

x y− p(¿) ( ¿¿2 ) |x ) ¿ ¿ E¿

the variance formula

where p(x)=P ( y =1|x ¿

1− p(x ) ¿ ¿ ¿¿ ¿ p (x )−2 p ( x ) 2+ p ( x )3 + p (x )2− p( x)3

¿ p (x )− p ( x )2 ¿ p (x ) [ 1− p(x ) ]

Var ( y|x ¿= p ( x) [ 1−p (x)]

Thus LPM suffers from Heteroskedasticity

How to correct for the heteroskedasticity

Know the functional form of the variance such Var ( y|x ¿= p ( x) [ 1−p (x)] So set

^h = ^y (1− ^y ) then use weights. i i i

However there is a problem if ^y i 1 because then ^hi is negative. Therefore set ^y =0.001∨ ^y =0.999

Topic 4: Misspecification, Proxy Variables and some Data Issues Testing for Misspecification Misspecification is when the parameters aren’t linear. This does not mean the variables can’t be linear. If the parameters aren’t linear, then the first Gauss-Markov assumption is violated. Consequences of Misspecification The parameters including β 0+ β1 + β 2 are all biased and inconsistent when the regression is misspecified. The size and direction of the bias depends on the left out variable β 3 and the correlation between the left out variable, and the other variables. Regression specification error test (RESET) Step 1: Estimate the model and compute the fitted values ^y , ^y 2 ∧^y 3 Step 2: Estimate the model regressing

^y

on original model plus ^y 2∧ ^y 3

such

y =β 0 + β 1 x 1 + β k x k +δ 1 ^y 2 +δ 2 ^y 3 + μ

Step 3: Test H0: δ 1=δ 2=0 – No misspecification. F Test or LM Test What are the issues with the RESET test? The RESET test does not provide a direction on how to proceed if there is evidence for functional form misspecification. Mizon-Richard Approach Being unsure if you need the x to be in a different form you propose two models, with the x in the current form and the different form. Then putting these two models together, if you accept that the first model is jointly insignificant, and then you accept the second model. Having two models such 1. y =β 0+ β1 x 1+ β 2 x2 + μ1

x ln(¿ ¿2)+μ2 (¿ x 1)+ β 2 ¿ 2. y=β 0 + β1 ln ¿ Then comprehensive model, or Mizon-Richard is

x ln(¿¿ 2)+μ3 (¿ x 1 )+ β 4 ¿ 3. y=β 0+ β1 x 1+ β 2 x2 +β 3 ln ¿ Then If

H 0 : β1= β2=0 cannot be rejected, then this is evidence for model 2

If

H 0 : β3 =β 4=0 cannot be rejected, then this is evidence for model 1

Davidson-Mackinnon Test Having two models such 1. y =β 0+ β1 x 1+ β 2 x2 + μ1

x ln(¿ ¿2)+μ2 (¿ x 1)+ β 2 ¿ 2. y=β 0 + β1 ln ¿ Then take the fitted values of model 1,

y^

and model 2, such

x ln (¿¿ 2)+ β3 ^y +μ3 (¿ x 1)+β 2 ¿ 3. y=β 0 + β 1 ln¿ Model 1 into Model 2. Then If

H 0 : β3 =0 cannot be rejected, then this is evidence for model 2

And Vice versa

What are the problems for non-nested methods? It is possible that there is no clear winner. Both models can be rejected or neither model could be rejected. Omitted Variable Bias There is omitted variable bias if the beta value is not equal to 0, such is a correlation between A proxy,

β3≠ 0

and there

β β 1 and another explanatory variable such Corr (¿ ¿ 1 , β 3 ) ¿

x 3 , which plays the role of the omitted variable

¿

x 3 , is characterised by

¿ x 3 =δ0 +δ 1 x 3 + v 3

Two conditions must be true 1. ZCM assumption holds for unobserved, the proxy and all other explanatory variables ¿ E ( μ|x1 , x2 , x3 ¿=0

Also once x3* has been included in the model, E ( μ| x3 ¿=0 . This just says that the proxy is irrelevant in the model when the other variables are included.

2. Secondly, condition two is that if the proxy is controlled for then the conditional mean of the unobserved does not depend on the other explanatory variables. MLR.4 says

v3 | x 3 ¿=0 E¿

Thus

Corr (v 3 , x3 )=0

¿ ¿ E ( x 3 | x 1 , x 2 , x 3 ¿=E( x 3 |x 3 ¿=δ 0+ δ1 x 3

Thus Corr (x 1 , v 3 )=Corr (x 2 , v 3)=0

If a good proxy can’t be found, what is another solution? Sometimes we have no clear idea of how to obtain a proxy for an unobservable factor. Using a lagged dependent variable accounts for historical factors that cause current differences in the dependent variable. An example crime=β 0 + β 1 unem + β 2 expend + μ

Cities with a high historical crime rate might spend more on law enforcement, i.e. Corr (expend , crim e1 ) ≠ 0 As we do not account for crime_1, this is included in μ . It follows that MLR. 4, i.e. E(μi | xi ) = 0, is violated.

Missing Data Missing data is not a problem if the data is missing at random. Missing data is a problem if it results in a nonrandom sample. Data on IQ is easier to collect for people with high IQ than for people with low IQ. The sample is not representative of the population. Nonrandom sampling Exogenous sample selection: Sampling based on independent variable - No bias

Endogenous sample selection: Sampling based on dependent variable – Bias Outliers An outlier is a figure that is substantially different from the other figures for a dependent variable. It could be because of mistaken data input such as adding an extra 0 or putting in the decimal place in the wrong place Why is this a problem? An outlying observations will have a corresponding large squared error and so gets a lot of weight in the minimisation problem which influences the estimates a lot. Why might an outlier not have a large effect? Because the sample size is large and because the dependant variable is in logarithmic form.

Topic 5: Basic Regression Analysis with Time Series Data

Time Series Assumptions TS1. Linearity in Parameters TS2. No perfect collinearity TS3. Strict zero conditional mean (sZCM) TS4. Homoscedasticity TS5. No serial correlation TS6. Normality Under TS1 – TS3, we know that the estimation is unbiased and consistent. Under TS1 – TS5, we know the least squares estimation is BLUE (the Best Linear Unbiased Estimator) Under TS1 – TS6, we know that it is normally distributed and thus inferences can be made in a small sample. TS5: No Serial Correlation means conditional on X, the errors in two different time periods are uncorrelated. Corr (μ t μ s| X ¿=0 for all t ≠ s TS6: Normality means the errors ut are independent of X and are independently and identically distributed as Normal (0,σ 2).

Under TS1-3, the OLS estimators are unbiased conditional on X and hence unconditionally as well. Under TS1-5, the OLS estimators are the best linear Unbiased estimators (BLUE)conditional on X. Under TS1-5, unbiased estimation of σ 2

Under TS1-6, the OLS estimators are normally distributed, conditional on X. Further, the t-stat and F-stat have the t-distribution and F-distribution respectively, under the null.  

A weaker assumption than TS5 isE(u u |x ,x )=0, for all t=s, t s t s 2 A weaker assumption than TS4 isVar(u |x ,..., x ) = σ , t = 1,2,...,n, t t1 tk

The correlation between ut and us is known as serial or auto correlation. because of “random sample”, sZCM is equivalent to E(u |x ,..., x ) = 0, t = 1,2,...,n. t t1 tk Temporary vs. Permanent Temporary is the effect of the x variable in one year. So for example the coefficient β2 might be the effect two years after and thus is purely a temporary effect. “A one-unit temporary shock to x at time t, β3 is the change in y after two periods.”

The permanent effect is determined by the long-run propensity and this is found by adding up all the coefficients and shows the permanent effect.

“A one-unit permanent shock to x increases y by β1 + β2 + β3 in the long run.”

What needs to hold for the BP test or White test when testing for heteroskedasticity in a time series regression? μt should not be serially correlated. e t should be homoskedastic and serially uncorrelated.

Topic 6: Serial Correlation in Time Series Regressions Serial Correlation What is serial correlation? Corr (μ t , μ s |X ¿=0 for all t ≠ s Consequences of Serial Correlation  Still unbiased.  OLS is no longer BLUE.  OLS standard errors are biased and so the usual t statistic and F statistic are invalid. What are the three important concepts in time series data?  Stationarity  An AR(1) process  A random walk process Test for Auto Correlation ^μt =ρ ^μ t−1 +e t What assumption is crucial in the test? The regressors need to be strictly exogenous. What is the problem if you regress price on wage and why?

Both variables are trending. This leads to a spurious regression result. Solution A solution to this problem would be to use the growth rate of the variables instead of the level of the variables. Generate these series and plot them over time. OR Include a time trend in the regression What is the Newey-West test? It corrects for the standard errors making them Heteroskedasticity and Autocorrelation Consistent (HAC) standard errors....


Similar Free PDFs