Applied econometrics - Lecture notes 1-8 PDF

Title Applied econometrics - Lecture notes 1-8
Course Applied Econometrics
Institution University of East London
Pages 11
File Size 91.8 KB
File Type PDF
Total Downloads 33
Total Views 143

Summary

first half of econometrics...


Description

Intro to quantitate economics ec114 

So far we have assumed that the relationship between our two variables, X and Y, is linear. In this case the population regression equation is of the form E(Y) = α + βX rather than, for example, E(Y) = α + βX 2 . As a consequence the sample regression equation we fitted to a scatter of data points was always a straight line: Yˆ = a + bX rather than, for example, Yˆ = a + bX2



However, many economic relationships are non-linear to a lesser or greater extent, and we need to be able to take this into account. A very simple way of fitting a curve to a scatter of points is to apply the so-called transformation technique. Consider the equation: Y ∗ = α + βX ∗ , where X ∗ = X ∗ (X) and Y ∗ = Y ∗ (Y) are simple functions (or transformations) of the variables X and Y, respectively. The choice of transformation depends on the nature of the data under consideration and can partly be determined by a scatter diagram plotting Y against X.



Note that the equation Y ∗ = α + βX ∗ is linear in the transformed variables Y ∗ and X ∗ (but not Y and X). Hence we can apply regression techniques to the following population regression equation: E(Y ∗ ) = α + βX ∗ . The corresponding sample regression equation is therefore Yˆ∗ = a + bX∗ where a and b are the OLS estimators of α and β. The formulae determining a and b (and also R 2 ) remain unchanged but are now expressed in terms of Y ∗ and X ∗ rather than Y and X.



The relevant formulae are therefore: b = Px ∗ i y ∗ P i x ∗2 i ; a = Y¯∗ − bX¯∗ ; R 2 = b 2 Px ∗2 P i y ∗2 i . In the above: x ∗ i = X ∗ i − X¯∗ i , X¯∗ i = PX ∗ i n , y ∗ i = Y ∗ i − Y¯∗ i , Y¯∗ i = PY ∗ i n .



The formulae for computing sums involving deviations from means remain valid for the transformed variables: Xx ∗2 i = X(X ∗ i − X¯∗ ) 2 = XX ∗2 i − ( PX ∗ i ) 2 n , Xx ∗ i y ∗ i = X(X ∗ i − X¯∗ )(Y ∗ i − Y¯∗ ) = XX ∗ i Y ∗ i − PX ∗ i PY ∗ i n . In view of this another common expression for b is: b = PX ∗ i Y ∗ i − PX ∗ i PY ∗ i n PX ∗2 i − ( PX ∗ i )



The demand for carrots Y (in kilograms) and the price of carrots X (in pence) are observed in a supermarket over a period of 30 weeks. Suppose we wish to estimate the



price elasticity of the demand for carrots and to predict the demand for carrots when the price is 90p per kg. The latter task is an out of sample prediction of the type covered previously. However, suppose we run a linear regression of Y on X; we obtain Yˆ = 92.9 − 0.881X, R 2 = 0.61. The regression line slopes downwards (the slope is −0.881) and 61% of the variation in Y (demand) can be attributed to X (price). The out-of-sample prediction, obtained when X = 90, is Yˆ = 92.9 − 0.881(90) = 13.6. Is this a realistic prediction? The fitted line is on the next diagram:



Turning to the price elasticity, this is given by η = dY dX X Y ; (note that Thomas inserts a minus in front of the right-hand-side). We estimate dY/dX by b and calculate the elasticity evaluated at the sample means of X and Y, X¯ = 46.9 and Y¯ = 51.6. The estimated elasticity is thus η = (−0.881)(46.9/51.6) = −0.801, meaning that a 1% rise in price leads to a fall of 0.8% in the demand for carrots (so carrots are price inelastic).



Neither the out-of-sample prediction nor the elasticity estimate are likely to be very accurate in view of the linear model being a poor representation of the data. It is likely that better estimates can be obtained from a non-linear model that provides a better fit to the data. There are many relationships that give rise to curves of varying degrees of non-linearity, and we will consider some important examples.



Consider the nonlinear function: Y = AXβ , A > 0, A and β constant. The diagram on the next slide presents the possible shapes for this function for different values of the parameter β, these being: (a) β > 1; (b) 0 < β < 1; and (c) β < 0



If we are going to use a function of the form Y = AXβ then how do we estimate A and β? Is it possible to use OLS? Obviously OLS cannot be applied directly to this function because it is not linear in the variables. However, the function has the convenient property that it can be made linear by taking (natural) logarithms. We shall need the following rules: ln(uv) = ln(u) + ln(v) and ln(p q ) = q ln(p), which we apply with u = A, v = X β , p = X and q = β.



As it stands, the equation Y ∗ = α + βX ∗ is deterministic i.e. non-random, but we can use it to define the population regression equation in the form E(Y ∗ ) = α + βX ∗ . Furthermore, introducing a random disturbance ∗ , Y ∗ then satisfies Y ∗ = α + βX ∗ + ∗ . We can apply OLS to this equation and estimate α and β by a and b giving the sample regression equation Yˆ∗ = a + bX∗ .



As it stands, the equation Y ∗ = α + βX ∗ is deterministic i.e. non-random, but we can use it to define the population regression equation in the form E(Y ∗ ) = α + βX ∗ . Furthermore, introducing a random disturbance ∗ , Y ∗ then satisfies Y ∗ = α + βX ∗ + ∗ . We can apply OLS to this equation and estimate α and β by a and b giving the sample regression equation Yˆ∗ = a + bX∗



To provide some background, consider a population of values for a random variable X. The variable X will have a probability distribution, which may be known or unkown (typically the latter). Suppose this distribution can be characterised by an unknown parameter θ. The parameter θ could represent the mean (µ) or variance (σ 2 ) of the distribution, or it could represent the regression slope parameter (β). Whatever it represents, the parameter θ needs to be estimated using sample information.



Whatever the estimator Q, it will have a sampling distribution. This is because the value of Q changes with each different possible sample, and the sampling distribution represents the distribution of Q across all possible samples. We can therefore talk about quantities such as the mean and variance of the estimator Q i.e. E(Q) and E[Q − E(Q)]2 . In the regression model we have two estimators, a and b, of α and β, the intercept and slope parameters. Both a and b will have their own sampling distributions, as well as a joint sampling distribution.



The small sample properties of estimators are determined by the sampling distribution for estimators obtained using a given sample size n. Such properties hold even when n may be small. This contrasts with large sample, or asymptotic, properties which are obtained as the sample size n gets larger and larger i.e. as n → ∞. You have already seen the Central Limit Theorem, which is an example of an asymptotic property for the sample mean. It is often written in the form Zn = √ n(X¯ − µ) σ d → N(0, 1) as n → ∞. We shall not be concerned with asymptotic properties here, only small sample properties.



A desirable small sample property for an estimator to possess is unbiasedness: Definition An estimator Q is said to be an unbiased estimator of θ if, and only if, E(Q) = θ. Put another way, the mean of the sampling distribution of the estimator Q is equal to the true population parameter θ. Or, if we were to take many samples, the average of all Q’s obtained would be equal to θ. This is illustrated on the next slide:



We have already seen that the sample mean is an unbiased estimator of the population mean: E(X¯) = µ. Note that the property of unbiasedness does not depend on the sample size; it is therefore a small sample property. If an estimator does not satisfy the unbiasedness property it is said to be biased and so there is a systematic tendency to error in estimating θ. The bias of an estimator Q is defined as bias(Q) = E(Q) − θ. If Q tends to over-estimate θ then E(Q) > θ and bias(Q) > 0. Alternatively, if Q tends to underestimate θ then E(Q) < θ and bias(Q) < 0



In practice we are faced with using a single sample to determine our estimator Q. Although unbiasedness is a good property for Q to possess, we need to consider other aspects of the sampling distribution as well, such as the variance. Consider two unbiased estimators, Q1 and Q2, so that E(Q1) = θ and E(Q2) = θ. Suppose, however, that the variance of Q1 is larger than the variance of Q2, so that V(Q1) > V(Q2). Which estimator would we prefer? The diagram on the next slide will help answer this question. . .



We would therefore prefer to use Q2 whose distribution is more condensed around θ than the distribution of Q1. Another desirable property for an estimator to possess is efficiency: Definition An estimator Q is said to be an efficient estimator of θ if, and only if: (i) it is unbiased, so that E(Q) = θ; and (ii) no other unbiased estimator of θ has a smaller variance. The two important properties for efficiency, therefore, are unbiasedness and smallest (or minimum) variance. Efficient estimators are sometimes also called best unbiased estimators.



The concept of efficiency is useful when comparing unbiased estimators, because we always prefer the one with the smaller variance. But suppose we are faced with the following problem. There are two estimators, Q1 and Q2, of a parameter θ. Q2 is

unbiased and efficient while Q1 has a small (positive) bias but also a smaller variance than Q2. We have E(Q1) > θ, E(Q2) = θ, V(Q1) < V(Q2). Can we compare these estimators in a meaningful way?



We shall consider the error of the estimator, given by Q − θ. The mean square error (MSE) of Q is given by MSE(Q) = E(Q − θ) 2 . It can be shown that MSE(Q) = V(Q) + [bias(Q)]2 , i.e. the MSE of Q is equal to the variance plus the square of the bias. One way of choosing the preferred estimator would be to choose the one that has the smallest MSE, which places equal weight on variance and squared bias.



It is usually a good idea, when estimating a parameter, to use all the sample information that is available. In general, the more information used, the more efficient the estimator will be. For example, it wouldn’t make sense to ignore 50% of a sample when estimating the population mean – the more information (observations) the better the estimator. An estimator that uses all the sample information is said to be sufficient. However, if an estimator uses all the observations inappropriately, it will not be efficient despite being sufficient. The point is that an estimator cannot be efficient unless it is sufficient



Example (Thomas, p.327). A variable X is normally distributed with mean µ and variance σ 2 . Three estimators of µ are proposed: mˆ = X¯ − 10, m˜ = X¯ + 5 n , m ∗ = n − 1 n − 2 X¯, where X¯ is the sample mean and n the sample size. (a) Explain why all three estimators will have sampling distributions that are normal in shape. (b) Recalling that E(X¯) = µ, use Theorem 1.1 to show that all the proposed estimators are biased and hence determine the bias in each case. If n = 10 and µ = 8, which estimator has the smallest absolute bias? (c) Recalling that V(X¯) = σ 2/n, use Theorem 1.1 to find the variance of the sampling distribution for each estimator. Hence determine which estimator has the largest variance.



We have dealt with the problem of estimating the unknown population parameters, α and β, in the linear regression model Y = α + βX + , where Y is the dependent variable, X is the regressor, and is a random disturbance that causes Y to deviate from its expected value, E(Y) = α + βX. In order to estimate α and β we assume we have a sample of n observations on Y and X, which satisfy Yi = α + βXi + i , i = 1, . . . , n, i.e. the model holds at each point in the sample



The Classical Linear Regression Model (CLRM) consists of a regression equation and a set of assumptions concerning the properties of the regressor X and the disturbance . With two variables, Y and X, the regression equation is Yi = α + βXi + i , i = 1, . . . , n, where n denotes the number of observations (sample size). We shall focus on finite sample (or small sample) properties and will not be concerned with large sample (or asymptotic) properties that hold when n gets bigger and bigger (i.e. n → ∞). Hence n is finite (n < ∞) but needs to be greater than 2 (the number of unknown parameters)



But IA is hard to justify in Economics in which the X values are not chosen by the researcher in an experiment but are observed in the real world. It is, however, a useful starting point for our analysis of the OLS estimators. It is interesting to note that the origins of IA lie in the Classical model having been developed for the physical sciences in which the researcher can choose the X values in an experiment and then see what the resulting values for Y are.



Assumption IB (fixed X) implies that if we could obtain more than one sample then the same values of X would be found in each sample. Again, this is something that may be possible in experiments in the physical sciences, but in Economics we typically only have one sample. Hence, given the fixed, non-random nature of X, the only source of randomness determining Y is , about which we shall make the following assumptions:



Assumption IIC (zero covariance) ensures that there is no systematic tendency for i to be related to j (j 6= i). This is often a strong assumption for time series where there can be a high degree of correlation from one period to the next. Recall that represents the deviation of Y from its average value E(Y), and can be regarded as the ‘unexpected’ component of Y. For example, a sequence of unexpectedly cold months might lead to an increased demand for gas for heating purposes. This unexpected rise in demand would be reflected in a sequence of consecutive positive shocks (the ’s) implying a positive covariance structure, which is ruled out by IIC.



Assumption IID (normality) builds on the previous assumptions by specifying that the distribution of each i is normal. Combining IIA (zero mean), IIB (constant variance) and IID (normality) gives i ∼ N(0, σ2 ), i = 1, . . . , n. Note that Yi − E(Yi) = Yi − α − βXi = i ; this

implies that V(Yi) = E (Yi − E(Yi))2 = E( 2 i ) = V(i) = σ 2 which in turn implies that Yi ∼ N(α + βXi , σ2 ), i = 1, . . . , n. We can illustrate these ideas in a diagram:



Under IA (non-random X), IB (fixed X) and IIA (zero mean), the OLS estimators are unbiased: E(a) = α and E(b) = β. This means that the means of the sampling distributions of a and b coincide with the population parameters α and β, respectively. Put another way, if we had repeated samples then a and b would, on average, be equal to the population parameters. In practice, of course, we only have a single sample, but it is a useful property for the distributions of the estimators to be centred at the population parameter values



It is also possible to show that the OLS estimators are best out of all possible linear unbiased estimators i.e. the OLS estimators are BLUE (best linear unbiased estimators). For this we need IA, IB, IIA, IIB and IIC. The proof of this result – also known as the Gauss-Markov Theorem – is a bit complicated! The variances of a and b – which are the smallest out of all LUEs – are given by V(a) = σ 2 a = σ 2 PX 2 i n Px 2 i and V(b) = σ 2 b = σ 2 Px 2 i . These are the variances of the sampling distributions of a and b; their square roots (σa and σb) are the standard errors.



If we now add Assumption IID (normality) it can be shown that the OLS estimators are efficient among all unbiased estimators (not just linear ones). The normality assumption, IID, also ensures that a and b are normally distributed: a ∼ N(α, σ2 a ), b ∼ N(β, σ2 b ). These results concerning the distributions of a and b enable us to conduct hypothesis tests – more on this in Lecture 16. However, there is one parameter that remains unknown, which is σ 2 . We therefore need to estimate σ 2 , but how? Recall that σ 2 is the variance of each i . Our usual sample variance estimator, for a sample 1, . . . , n, is calculated as the sum of squared deviations of observations from their mean, divided by n − 1: s 2 = P(i − ¯) 2 n − 1 . Problem: we don’t observe the i so can’t compute such an estimator. However, we can use the residuals ei instead, the obvious estimator being ˆs 2 = P(ei − ¯e) 2 n − 1 .









We have seen that the Classical Linear Regression Model (CLRM) consists of a (population) regression equation, Yi = α + βXi + i , i = 1, . . . , n, and a set of assumptions concerning X and . The assumptions are: IA (non-random X): X is non-stochastic (nonrandom); IB (fixed X): The values of X are fixed in repeated samples; IIA (zero mean): E(i) = 0, for all i; IIB (constant variance): V(i) = σ 2 = constant for all i; IIC (zero covariance): Cov(i , j) = 0 for all i 6= j; IID (normality): each i is normally distributed.



Inference in the CLRM 5/34 Under Assumptions IA, IB, IIA, IIB, IIC, and IID, the OLS estimators are BLUE (best linear unbiased estimators) as well as normally distributed. We have seen that the sampling distributions of a and b are given by a ∼ N(α, σ2 a ), b ∼ N(β, σ2 b ), where V(a) = σ 2 a = σ 2 PX 2 i n Px 2 i , V(b) = σ 2 b = σ 2 Px 2 i . These distributions provide a basis for making inferences about α and β.

standardising we obtain a − α σa ∼ N(0, 1), b − β σb ∼ N(0, 1), suggesting that the N(0, 1) distribution can be used for inference. However, the problem is that we don’t know σ 2 and, hence, we can’t compute σ 2 a and σ 2 b . We therefore need to estimate σ 2 . An unbiased estimator of σ 2 is s 2 = Pe 2 i n − 2 i.e. E(s 2 ) = σ 2 .



The corresponding standardised versions of a and b are a − α sa ∼ tn−2, b − β sb ∼ tn−2. These distributions are Student’s t because we have had to estimate σ 2 using s 2 . The distributions have n − 2 degrees of freedom because we have ‘lost’ two degrees of freedom through estimating α and β. The standardised variables above are used to construct confidence intervals and to test hypotheses concerning α and β using the tn−2 distribution.



Inference in the CLRM 9/34 95% confidence intervals (CIs) for α and β can be constructed as follows: for α, the 95% CI is a ± t0.025sa, for β, the 95% CI is b ± t0.025sb, where t0.025 is the value from the tn−2 distribution that puts 2.5% of the distribution into each tail. The interpretation is that we are 95% confident that α lies in the interval [a − t0.025sa, a + t0.025sa] , while we are 95% confident that β lies in the interval [b − t0.025sb, b + t0.025sb] .



Inference in the CLRM 11/34 We first need to compute s 2 ; for this we need Xe 2 i = Xy 2 i − b Xxiyi = 26.403 − (0.17485 × 116.60) = 6.0155. It follows that s 2 = Pe 2 i n − 2 =

6.0155 28 = 0.2148. We then obtain s 2 a = s 2 PX 2 i n Px 2 i = 0.2148 × 1274.66 30 × 666.86 = 0.0136, s 2 b = s 2 Px 2 i = 0.2148 666.86 = 0.0003221 

The resulting standard errors are sa = √ 0.0136 = 0.1170, sb = √ 0.0003221 = 0.01795. We can use these to form the confidence intervals – the value t0.025 for the t28 distribution is 2.048 – hence for α the 95% CI is a ± t0.025sa = 0.0212 ± (2.048 × 0.1170) = 0.0212 ± 0.2396 so the interval is [−0.2184, 0.2608], while for β b ± t0.025sb = 0.17485 ± (2.048 × 0.01795) = 0.17485 ± 0.03676 yielding the interval [0.1381, 0.2116].



As a result the CI for β is more informative than that for α. Another way of interpreting this is that the parameter β is estimated more precisely than α – it has a smaller standard error and hence there is less uncertainty about its value.



For example, in the model Yi = α + βXi + ...


Similar Free PDFs