Review for ECON 321 PDF

Title Review for ECON 321
Author akankshita chopra
Course Econometrics 2
Institution University of Waterloo
Pages 49
File Size 510.4 KB
File Type PDF
Total Downloads 8
Total Views 132

Summary

These notes provide a review for ECON 321...


Description

ECON 321/322: Reviewa Yi = α + βXi + εi Yi = α + βXi + γZi + εi Yi = α + βXi + γXi2 + εi In the latter case, one cannot infer that a one unit change in X will affect Y by β: if there is a one unit change in X, X2 will also change. By linear model, we mean that the model is linear in parameters α, β, γ, etc. In the case of the second model, the effect of a one-unit change of X on Y will be β + 2γX (which is equal ) There will always be factors for which we cannot control that will be included in ε. Overall, the model will include k variables and 1 constant, which means that we will need to estimate k+1 parameters. a These

notes are based on Wooldridge, 2006 1

We assume that E(ε|X, Z)=0. The Ordinary Least Squares estimate are obtained by minimizing the sum of squared residuals, i.e., by minimizing Σ(yi − α − βxi − γzi )2 with respect to α, β and γ . The FOC will be Σ(yi − α − βxi − γzi ) = 0 Σxi (yi − α − βxi − γzi ) = 0 Σzi (yi − α − βxi − γzi ) = 0 and you will have k+1 FOC. We will be able to estimate all the parameters of the model, as long as the regressors are not perfectly correlated (or multiples of one another).

2

You can obtain predicted values for each observation: b i+γ b zi ybi = α b + βx If one were to estimate the model Yi = α + βXi + εi instead of Yi = α + βXi + γZi + εi , the two values of β estimated would be different. There are two exceptions to this rule 1) If the true value of γ is zero. 2) If X and Z are completely independent of one another (uncorrelated)

3

Goodness of fit We use the R2 to determine how well the model explains the dependent variable. You should keep in mind that adding regressors to the model will ALWAYS make the R2 increase or remain the same, whether they are relevant to the model or not. One should also be careful to use a sample that contains more observations than the number of parameters we are trying to estimate (n ≥ k+1)

4

Omitted variable Bias Excluding relevant regressors from the model will bias the estimates of the other parameters of the model. Your model is underspecified. Imagine the true model is b Yi = α + βXi + γZi + εi , from which one would obtain β. but that you choose to estimate Yi = α + βXi + νi and obtain βe νi = γZi + εi

5

Let δ be the slope of the regression of Z on X eγ , which means that βe = βb + δb e = E(βb + δb eγ ) = E(β) b + E(b E(β) γ )δe = β + γδe b = E(β) e − β = γ δ, e which is called the Therefore the bias is E(βe − β) “omitted variable bias”.

6

The omitted variable bias will be null only if δe is 0 (which would mean that X and Z would be completely independent of one another). If this is not the case, δe will have the same sign as the correlation between X and Z. The bias, whether it is positive or negative, depends on both the sign of δ and the sign of γ .

7

e In the case where the true value of γ is zero, one would prefer β, because it has a lower variance than βb (more on this in a few slides). Including irrelevant regressors in the model will not affect the estimate of the other parameters of the model but will affect their variance

8

Variance of the OLS Estimator V ar(βj ) =

σ2 SSTj (1−Rj2)

where SSTj =

P (xij − xj )2 and Rj2 is the

R2 of the regression of xj on all other independent variable and an intercept. What should be concluded from this is that the higher the variance from the errors (σ 2 ), the higher will be the variance of the β’s. The only way to reduce the variance of the residuals would be to add more regressors in the model. Conversely, the higher the variance of the xj ’s the lower the variance of βj .

9

Lastly, as Rj2 , the proportion of xj explained by the other independent variables, goes up, the variance of βj also increases. In other words, the more colinear xj is with the other explanatory variable, the less precise is the estimate of βj (the higher its variance). If Rj2 is very close to 1 however, this would violate the assumption that the regressors are not completely colinear. Multicolinearity of the regressors is what this problem is called.

10

Variance of the OLS estimator when omitting relevant regressors e = P σ2 2 (from your chapter 2 notes) whereas V ar(β) (x−x) 2 V ar(βbj ) = P(x−x)σ2 (1−R2) if the model is correctly specified. If the j

true value of γ is zero, these will be equal if X and Z are e will always be smaller. If uncorrelated (Rj2 =0). Otherwise, V ar(β) γ is 0 but that X and Z are not uncorrelated βe will be preferred to βbj .

11

Estimating σ 2 with s2 2

2

Σ e2 i n−k−1

which is the general case of what Our estimate of σ is s = we saw in chapter 2 (k=1 in chapter 2). Based on this, the standard deviation of the estimated parameters σ , again, a logical modification of will be sd(βj ) = √P 2) 2 (x−x) (1−Rj

what we saw in chapter 2.

12

Sampling Distributions of the OLS estimator We need to assume that the errors, which we cannot observe, follow a normal distribution in the population. We will assume that they follow a N(0, σ 2 ) and that they are not correlated with the explanatory variables. We now know, from this assumption and the usual Gauss-Markov assumptions, that y will follow a normal conditional on the x’s. Of course, we will not be able to always assume that the dependent variable follows a normal distribution (think of hourly wages). Sometimes, taking the log of the dependent variable may help. b will also follow a These assumptions lead us to believe that the β’s b which can normal distribution with mean β and variance V ar(β), be standardized into a N(0,1).

13

Testing Hypotheses for one parameter: the t-test follows a t distribution We assume that our test statistic for now (remember that we are working with a sample, not the entire population). This t distribution has n-k-1 degrees of freedom. The hypothesis usually tested (the null) is that the parameter is 0 (the explanatory variable has no impact on the dependent variable, once the effect of the other covariates has been controlled for). In reality, e point estimate of β will never be exactly 0. You must choose an alternative hypothesis. A one-sided alternative would take the form 0 while a two-sided alternative would be β6

14

If one is performing a one-tailed test, you want to reject Ho if t . tcDoF is determined by 1) the level of significance of the test (usually 10%, 5% or 1%) and 2) the number of degrees of freedom (n-k-1). If the alternative is of the form β β)=2(T > β)

16

Economic significance vs Statistical significance A variable can have a statistically significant impact but not an economic one: economic significance depends on the size of the parameter. This is important when analyzing the output from a regression for policy analysis.

17

Confidence Intervals A confidence interval can be obtained by adding and subtracting from the estimated parameter a constant times the value of the estimated standard deviation for the parameter. This constant depends on the statistical significance of the confidence interval: 1 for 68%, 2 for 95% and 3 for 99% (the last 2 are most commonly used. Remember that these are estimates). The correct way to obtain the constant would be to refer to the t distribution with n-k-1 DoF for the level of statistical confidence chosen.

18

Hypotheses tests with linear combinations of the parameters One could wish to test that β = γ against β < γ. One could do this by including parameter φ = β − γ in the regression and re-writing the model. By running a hypothesis test of φ, one can test β = γ .

19

Testing multiple restrictions: F-tests One could choose to simultaneously test β=0, γ=0 and δ=0 against one/any of them being false. We call this type of test a joint test. To test this type of hypothesis, we must first run 2 regression: one for the restricted model (the model in which we impose β=0, γ=0 and δ=0) and one for the unrestricted model. These 2 model would be respectively y = α + ε and y = α + βx + γz + δv + ε.

20

Second, one can compute the F statistic using two different but equivalent methods r −SSRur )/q which follows a Fq,n−k−1 F= (SSR SSRur /(n−k−1)

(R2

2

)/q

r F= (1−R2ur)−R which follows a Fq,n−k−1 /(n−k−1) ur

where q=DoFr and n-k-1=DoFur (usually qFc and conclude that the regressors are jointly significant. Otherwise they are jointly insignificant and can usually be dropped from the model. Sometimes, variables that do not have statistically significant effects individually will have a jointly statistically significant impact.

21

Note that one should use a t-test when testing only one constraint even if th tati ual to the squ st tic. Violating the homoskedasticity assumptions renders t and F tests useless. One can test the equality to zero of all the parameters in the model (see preceding example) or fewer of them (e.g. γ=0 and δ=0 only). β x would be included in both the restricted and the unrestricted model in this case. Testing that all parameters are jointly stati y th e si n. P-values can still be used in this context: they would ascertain of what is the probability of observing the F-statistic we obtained given that the null hypothesis is true.

22

Binary variables Dummy (binary) variables are used to control for observable qualitative characteristics. Gender is probably the best known example. One can also dichotomize a continuous variable (e.g. income can be transformed into categories of income and each of these categories can be made into a dummy). Controlling for a binary variable means that we allow the intercept of the model to shift between categories (e.g., the intercept will be different given that the individual is a man or a woman). Some people interpret this as the fixed effect that comes from belonging to one category vs another.

23

Choosing the base (benchmark) group is therefore very important. One can either drop the intercept from the model and control for all categories (control simultaneously for a dummy which is equal to 1 for men and another dummy which is equal to 1 for women) or keep the intercept in the model and control for only one of the two dummies (the interpretation of the coefficient on the dummy becomes the marginal effect on y of being a women).

24

One can control for a variety of categorical variables in a model. We cate s to avoid perfect rity (ot se kno trap). The coefficients of the dummy variables represent the differential effects on the intercept of belonging to one group vs another. The best way to control for an ordinal variable is to dichotomize it into a number of dummies controlling for each category. This allows the effect of moving from one category to another to be different depending on where on the scale you start from. If the ordinal variable takes too many values, one can create dummies such that some values are regrouped together.

25

Interactions between dummies One may wish to estimate what would be the effect of being a married woman instead of just being married or just being a woman. In this case, dummy variables would be multiplied together to find the additional effect of being a married woman on the intercept (note that one could also simultaneously control for being a woman and being married separately). The total differential effect of being a married woman would be the sum of the coefficients on all three binary variables in this case. One can also allow for the slope of a relationship to differ according to these charactristics by controlling for an interaction term between a dummy and a variable of interest.

26

Differences in regressions across groups One could also want to evaluate whether two groups should be included in the same sample or whether it would be preferable to allow the model to take different values of the parameters for the slopes and intercepts for both groups. One could run a restricted model (restricting the values of the parameters to be the same for both groups) and an unrestricted one (running separate models for both groups or creating all possible interaction terms in the model and controlling for them) and use an F-test to determine which method is preferable.

27

Running separate models may be preferable in complex models. The F-test (also known as the Chow test in this context) would n−2(k+1) r −(SSR1 +SSR2 ) take the form F = SSRSSR ∗ + SSR k+1 1 2 where SSRU R = SSR1 + SSR2 .

28

Linear probability Models A linear probability model is used when the dependent variable is a binary variable. In this case, the coefficients of the explanatory variables represent the change in the probability of observing Y=1 given a one-unit change in an explanatory variable. Instead of the R2 , one could want to use the percentage of correctly predicted y’s as a measure of goodness of fit in this model.

29

This model has its drawbacks as it is entirely possible to obtain negative predicted probabilities with this model. We will also have heteroskedasticity present in this type of model. A solution to these problems is to use a logit or probit model, which uses the maximum likelihood technique. You will likely see this in 421 or more advanced econometrics courses.

30

Heteroskedasticity Homoskedasticity fails when the variance of the errors is not constant across the entire population. Usually, the variance varies with one explanatory variable. As homoskedasticity of the errors is a necessary condition to use F and t tests, we must use alternative methods to do hypothesis tests. Note that the estimates of the coefficients are not biased due to heteroskedasticity. The R2 and R2 will not be affected either. βOLS is no longer BLUE if heteroskedasticity is present.

31

If heteroskedasticty is present, then P

that now, V

i

i

which means σ2 SSTx P (xi −x)ei2 to SSTx2

which does not reduce to

because of the heteroskedasticity. Therefore, we use estimate Var(βOLS ) when we suspect that heteroskedasticity is present. This is what we call White’s estimator. When using multiple regressors, V ar(βbj ) =

P

2 2 rc ij ei SSRj2

the ith residual from regressing xj on all other independent variables and SSRj is the SSR from this same regression.

32

2

where rc ij is

We can use these variances to obtain heteroskedasticity-robust standard errors which can be used for t-tests. Again, these will only follow a t distribution given a big enough sample size (this was also true when we had homoskedastic errors).

33

LM statistic under homoskedasticity LM tests can be used for hypothesis testing (instead of F-tests). For the LM test one n ds only to estimate the restricted model: one would first estimate the restricted model and regress the residuals on all the explanatory variables, including omitted ones. The R2 from this regression would be used to estimate LM=nR2 . The LM test follows a χ2 distribution.

34

One could wish to compute an tL statistic (in lieu of an F-stat). One would first estimate the restricted model and obtain the residuals from this regression (e). One would perform multiple regressions in which the explanatory variables that were OMITTED from the restricted model are regressed on the explanatory variables INCLUDED in the restricted model. We would obtain one set of residuals per regression(rj ).

35

One would then calculate the products between these sets of regressors and the regressors of the original regression (products of e and rj ). One would then regress 1 on these products (without allowing for an intercept). LM=n-SSR (where SSR is from the final regression). Here again, the LM follows a χ2 .

36

Tests for heteroskedasticity The whole point of testing for heteroskedasticity is to check whether the residuals are related to one of the explanatory variables. Once we have obtained ei2, we can regress them on a constant and all explanatory variables. The null hypothesis (homoskedasticity) tests that the coefficients are all null. One 2

/k 2 would use an F-test= (1−R2R )/(n−k−1) or an LM test=nR to do this. This LM test is referred to as the Breusch-Pagan test for heteroskedasticity. If we suspect heteroskedasticity to be a function of a few specific explanatory variables, one could modify the test and only include these in the regression.

37

One could also regress the square of the errors on a constant, the explanatory variables, their squares and their cross-products. The LM test =nR2 is here called the White test for heteroskedasticity. We loose many degrees of freedom with the White test because so many regressors are included. One could therefore modify the White test by using the OLS fitted values and their squares as regressors instead of the explanatory variables.

38

Weighted Least Squares Estimation Let’s first assume that Var(εi |xi ) = σi2h(xi ) where h(x) is a positive function of the explanatory variables on which the heteroskedasticity depends. Given the usual regression model yi = α + βxi + γzi + εi , we want to transform this into a model that will have homoskedastic errors. Var(εi |xi ) = E(εi2 |xi ) = σ 2 h(xi ) = σ 2 hi which means that V ar(εi /hi ) = σ 2 . Therefore, by dividing our original model √ through by hi we will obtain a model with homoskedastic errors √ (N.B.: the intercept is also divided by hi ).

39

We refer to these new estimators of α, β and γ as the generalized least squares estimators (GLS) or the weighted least squares (WLS) (less weight is given to observations that are less precise). Note that the R2 is not as useful as before because it is a measure of the degree of variability of the transformed dependent variable that is explained by the transformed regressors. Weighted least squares will only be more efficient than OLS when we have correctly identified the form/source of the heteroskedasticity. Using the wrong form of heterosedasticity is still a second best, compared to ignoring its existence.

40

As we often do not have the true hi , we will use hbi and will call this estimator FGLS (for Feasible GLS). One would first regress y on the explanatory variables to obtain residuals e. One would then regress the logged squared residuals (log e2 ) on the same explanatory variables and use the fitted values eb. hb = exp(b e), which would then be used to perform (feasible) GLS. Since we use the b the estimators same data to run the FGLS regression and to find h, of the parameters are no longer unbiased (nor BLUE). One could b also use Yb and Yc2 to find h. We must be careful to use the same weights when estimating restricted and unrestricted models to use F-tests.

41

Specification & Data Problems Misspecification of the functional form This occurs when the functional form we have chosen does not reflect the relationship between the dependent variable and the explanatory variables (e.g. it is not linear, or we omit functions of the independent variables). This can result in the bias of the other parameters or an inconsistent estimations of the partial effects. If one omits variables, this can be tested by using F-tests for joint hypotheses tests.

42

RESET test These are used to detect functional form specification problems (RESET stands for REgression Specification Error Test). One would add to the usual regressors the square of the p...


Similar Free PDFs