Econometrics Revision Notes[ 4409] PDF

Title Econometrics Revision Notes[ 4409]
Course Econometrics 1
Institution Queen Mary University of London
Pages 5
File Size 218 KB
File Type PDF
Total Downloads 105
Total Views 142

Summary

Summary of module...


Description

Econometrics Revision Notes: Linear Regression Model - LRM can be written as : Yi = B1 + B2X2i+ B3X3i + … + BkXki + Ui  population (1.1) - Y = dependent variable X= regressors U= error term i= ith observation. - in short  Yi = BX (conditional mean) +Ui => Yi is equal to the mean value of population of which he/she is a member + or – a random term. (1.2) - e.g. if Y = family expenditure on food, X = represents family income. Estimation of the LRM - The method of ordinary least squares (OLS): Rewrite 1.1  Ui = Yi – (B1+B2X2i+B3X3i+..+BkXki) = Yi – BX (1.3)  the errorm term is the difference between the actual Y value and the Y value obtained from the regression model. - To obtain the B coefficients  make the sum of the error term Ui as small as possible. Ideally zero. OLS does not minimise the sum of the error term, but minimises the sum of the squared error term as follows: -

(1.4)

= the error sum of squares (ESS)

The classical linear regression model (CLRM) The CLRM makes the following assumptions: - A1: The model is linear in the parameters as in (1.1) it may or may not be linear in the variables Y and Xs. - A2: Regressors are fixed across different samples - A3: Given any value of X, the expected value (mean) of the error term is zero. - A4: Homosedastic ( equal variance) or constant variance = if the residuals have constant variance it means that every X is equally good to explain Y. - A5: There is no correlation between two error terms. That is, there is no autocorrelation. - A6: There are no perfect linear relationships amongst the X variables. The assumption of no multicollinearity. - A7: The regression model is correctly specified. There is no specification bias or error in the model used in empirical analysis. - A8: The error terms are normally distributed. - OLS estimators are BLUE (best linear unbiased estimators) A.K.A Gauss-Markov theorem. 1) estimators are unbiased 2) the minimum variance are the most efficient 3) estimators are linear functions of the dependant variable Variances and Standard of errors of OLS estimators - For the LRM, an estimate of the variance of the error term is obtained as (o2 is the variance). - ^ That is the residual sum of squares (RSS) / (n-k) which is called the degrees of freedom N being the sample size and K being the number of regression parameters estimated.

R^2: a measure of goodness of fit of the estimated regression: - Total variation in the dependant variable that is explained by the regressors. Adjust R^2 => takes into account the degree of freedom. - slope of coefficient: the change in the dependant variable associated with a one-unit increase in the explanatory variable holding the other explanatory variable constant.

- TSS = ESS + RSS  - Thus the coefficient of determination is simply the proportion/percentage of the total variation in Y explained by the regression model. - R^2 therefore lies between 0 and 1, the closer it is to 1 the better the fit, the closer to 0, the worse. - R^2 can be used to compare the fits of regressions with the same dependent variable and different numbers of independent variables. - We cannot compare two models that have different dependant variables. Standard error of regression: measures the average size (mistake) of the OLS residual. F Test: whether the coefficients are 0 and to test the overall significance of a regression. - H0 : R^2 = 0 - H1 : R^2 ≠ 0 Hypothesis Testing: - Null Hypothesis (“H0”) : The outcome that researchers do not expect. - Alternative Hypothesis (“HA”): The outcome that the researcher does expect. e.g. H0 : B ≤ 0 (you do not expect), B > 0 (you do expect) Confidence intervals = a range that contains the true value of an item specified % of that time. - if H0 : BK = 0 and zero lies in this interval, we cannot reject H0. T test: 1: Compute a T statistic, 2: Compute the degrees of freedom N-1 to find the critical value, 3: look what is the critical value of the T table, 4: T statistics > CV then you reject the null hypothesis. - null hypothesis ( all the regressors add up to zero). Multicollinearity: - If there are one or more such relationships among the regressors we call it Multicollinearity. - E.g. X2i + 3X3i = 1  perfect collinearity, therefore if we were to include both X2i and X3i in the same regression model, we will have a perfect collinearity. If X2 rises, X3 will rise proportionally, you cannot separate the two independent variables from Y (the dependant variable). - On the other hand, if we have X2i + 3X3i + Vi = 1, where V is a random error term,  imperfect collinearity, the presence of an error term dilutes the perfect relationship between these variables. F statistic : Whether all the variables, which you have included in the model, are they jointly significant. Consequences of imperfect collinearity: 1: OLS estimators are still BLUE, but they have large variances and covariance’s, making precise estimation difficult.

2: As a result, the confidence intervals tend to be wider. Therefore, we may not reject the “zero null hypothesis” i.e. the true population coefficient is zero. 3: Because of (1), the T ratios of one or more coefficients tend to be statistically insignificant. 4: Even though some regression coefficients are statistically insignificant, the R^2 value may be very high. 5: The OLS estimators and their standard errors can be sensitive to small changes in the data. 6: Adding a collinear variable to the chosen regression model can alter the coefficient values of the other variables in the model. - *When regressors are collinear, statistical interference becomes shaky, especially so if there is near collinearity. This is not surprising, because if two variables are highly collinear it is very difficult to isolate the impact of each variable separately on the regressand. Detection of Multicollinearity: 1: High correlation among independent variables. 2: There are low T statistics 3) High and significant F statistic 4) significant changes in coefficient of variables across models. 5) High R^2 but few significant T ratios 6): Find the relationship between two regressors, e.g. COR (age, age^2) or COR of ( mom and dad years of education.) 7): Variance inflation factors; remove the dependant variable, use one of the regressors to replace it and run the regression. Of its higher than 10 then it has affected Multicollinearity.

Heteroscedasticity (unequal variance): - the CLRM assumes that the error term Ui in the regression model has homoscedasticity (equal variance). E.g. studying consumption expenditure in relation to income, this assumption would imply that low-income and high-income households have the same disturbance variance even though their average level of consumption expenditure is different. - However, if the assumption of homo is not satisfied, we have the problem of hetro  thus, compared to low income households, high income households have not only higher average level of consumption expenditure but also greater variability in their consumption expenditure.  as a result, in a regression of consumption expenditure in relation to household income = Hetrosedasticity. T = coefficient/ standard error. IF T > CV, you reject the null hypothesis and you are rejecting it more than we should because the T is inflated = Type 1 errors. Consequences: 1: Hetro does not alter the unbiasedness and consistency properties of OLS estimators. 2: But OLS estimators are no longer of minimum variance or efficient. That is, they are not BLUE; they are simply linear unbiased estimators (LUE). 3: As a result, the T and F tests based under the standard assumption of CLRM may not be reliable, resulting in inaccurate conclusions regarding the statistical significance of the estimates regression coefficients. 4. In the presence of Heteroscedasticity, the BLUE estimators are provided by the method of weighted least squares (WLS). Detection of Heteroscedasticity: Breusch-Pagan (BP) Test: estimate the the OLS regression, and obtain the squared OLS residuals from

this regression. - Regress the square residuals on the K regressors included in the model - The null hypothesis here is that the error variance is homoscedastic – that is, all the slope coefficients are simultaneously equal to zero. - Use the F statistic from this regression with (k-1) and n-k) in the numerator and denominator df, to test this hypothesis. - If the computed F statistic is significant, we can reject the hypothesis of homoscedasticity. If its not, we may not reject the null hypothesis. - The idea is that we have a population process where there is a population error and the variance of that population error depended on some sort of linear combination of all the independent variables. BP test doesn’t observe the population error, rather it estimates (^) the population error, which comes from the fitted value’s we get from the least squares (residuals) e.g.

- if we square that and regress that, and if any of them are statistically different from 0, we have some sort of form of Hetroscedasticity. Graphical Method:

White Test: - Proceeds in the spirit of BP test and regress the squared residuals on the seven regressors, the squared terms of these regressors, and the pairwise cross-product terms of each regressor, for a total of 33 coefficients. - Obtain R^2 value from their regression and multiply it by the observations. - If the value is < 5%  we reject the null hypothesis. Hetroscedasticity Solutions: - To get rid of hetro, you use one of the variable regressors e.g. Var / Wi  regressors => Y1/ Wi = B0 / Wi + B1X1 / Wi. Autocorrelation: (correlation among error terms Ut) - one of the assumptions of CLRM is that the covariance between Ui, the error term for the observation i and Ui the error term for the observation is zero. - In time series: the most common zero correlations. - Spatial correlation: the ordering of the data must have some logic or economic interest. - Positive correlation: large positive errors follow large positive errors, large negative errors follow large negative errors. THINK THE THUMB DANCE - Negative correlation: large positive errors follow large negative errors.

Causes: - specification bias: a regressors that is very important but you have not included it. - cobweb phenomenon: the dependant variable is expired by a regressor in the past rather than the current regressor. - Lags: Dependant variable is explained by the dependant variable from the period before. Detecting autocorrelation: - E.g. if observation is 13  go to the Dw table nr 13 and find upper limit and lower limit. Tests of autocorrelation:(Graphical method)

Durbin-Watson d test:(assumption) - The regression model includes an intercept term. - The regressors are fixed in repeated sampling. - The error term follows the first-order autoregressive (AR1) => 1 period before. - The regressors do not include lagged value(s) of the dependant variable. e.g. regressors do not include Yt-1, Yt-2 and other lagged terms of Y. DL= lower limit Du= upper limit

- d lies between 0 and 4, the closer to 0 the greater the evidence of positive correlation, the closer to 4 the greater the evidence of negative correlation. Breusch-Godfrey (BG) general test of autocorrelation: - It allows for lagged values of the dependant variables to be included in regressors. - it allows for higher-order auto-regressive schemes, such as AR. - it allows for AR and moving average terms of the error term, such as Ut-1, Ut-2 and so on. - e.g. where Vt is the error term that follows the usual classical assumptions. This is an AR autoregressive structure where the current error term depends on the previous error terms up to p lags. The precise value of p is often a trial and error process. The Null hypothesis H0 is: P1 = P2 = …= Pp = 0. That is there is no no serial correlation of any order....


Similar Free PDFs