Exercise 1 (Week 37) 1. Exercise 3.7 (exercise 7 on chapter 3) of Wooldridge. Which of the following can cause OLS estimators to be biased? • (i) Heteroskedasticity. • (ii) Omitting an important variable. • (iii) A sample correlation coefficient of .95 between two independent variables both included in the model. Only (ii), omitting an important variable, can cause bias, and this is true only when the omitted variable is correlated with the included explanatory variables. The homoskedasticity assumption, MLR.5, played no role in showing that the OLS estimators are unbiased. (Homoskedasticity was used to obtain the usual variance formulas for the bˆj) Further, the degree of collinearity between the explanatory variables in the sample, even if it is reflected in a correlation as high as .95, does not affect the Gauss-Markov assumptions. Only if there is a perfect linear relationship among two or more explanatory variables is MLR.3 violated. 2. Exercise 3.9 (exercise 9 on chapter 3) of Wooldridge. The following equation describes the median housing price in a community in terms of amount of pollution (nox for nitrous oxide) and the average number of rooms in houses in the community (rooms): log( price) = β0 + β1 log(nox) + β2 rooms + u • (i) What are the probable signs of β1 and β2 ? What is the interpretation of β1 ? Explain. We expectβ1 < 0 because more pollution can be expected to lower housing values; note that β1 is the elasticity of price with respect to nox. β2 is probably positive because rooms 1

roughly measures the size of a house. (However, it does not allow us to distinguish homes where each room is large from homes where each room is small.) • (ii) Why might nox [or more precisely, log(nox)] and rooms be negatively correlated? If this is the case, does the simple regression of log( price) on log(nox) produce an upward or a downward biased estimator of β1 ? If we assume that rooms increases with quality of the home, then log(nox) and rooms are negatively correlated when poorer neighborhoods have more pollution, something that is often true. We can use Table 3.2 to determine the direction of the bias. If β2 > 0 and Corr(x1 ,x2 ) < 0, the simple regression estimator β˜1 has a downward bias. But because β1 < 0, this means that the simple regression, on average, overstates the importance of pollution. [E(β˜1 ) is more negative than β1 ]. • (iii) Using the data in HPRICE2.RAW, the following equations were estimated: \ log( price) = 11.71 − 1.043log(nox),

n = 506,

\ log( price) = 9.23 − 0.717log(nox) + 0.306rooms,

R2 = 0.264

n = 506,

R2 = 0.514

• (iv) Is the relationship between the simple and multiple regression estimates of the elasticity of price with respect to nox what you would have predicted, given your answer in part? (ii) Does this mean that -0.718 is definitely closer to the true elasticity than -1.043? This is what we expect from the typical sample based on our analysis in part (ii). The simple regression estimate, −1.043, is more negative (larger in magnitude) than the multiple regression estimate, −.718. As those estimates are only for one sample, we can never know which is closer to β1 . However, if this is a “typical” sample, the true β1 is closer to −.718.


3. The file CEO.dat contains data on 447 chief executive officers and can be used to examine the effects of firm performance on CEO salary. • (i) Estimate a model relating annual salary to firm sales and assets value. Make the model of the constant elasticity variety for both independent variables. Write the results out in equation form. \ log(salary) = 4.254 − 0.193log(sales) + 0.156log(assets), n = 447, R2 = 0.245

n = 447,

The coefficient on sales imples that every time sales increases by 1%, salary increases by 0.193%, or another way of saying it is that everytime sales increases by a 100%, salary increases by 19.3%. The coefficient for assets implies that everytime assets increase by 1%, salary increases by 0.156%. • (ii) Add profits to the model from part (i). Why can this variable not be included in logarithmic form? Would you say that these firm performance variables explain most of the variation in CEO salaries? \ log(salary) = 4.674 − 0.151log(sales) + 0.146log(assets) + 0.0000436 pro f its, n = 447,

R2 = 0.252

The coefficient on profits is very small. Here, profits are measured in millions, so if profits increase by $1 billion, which means profits = 1,000 – a huge change – predicted salary increases by about only 4.36%. However, remember that we are holding sales and assets fixed. • (iii) Add the variable tenure to the model in part (ii). What is the estimated percentage return for another year of CEO tenure, holding other factors fixed? \ log(salary ) = 4.491 − 0.159log(sales) + 0.149log(assets) + 0.000041 pro f its + 0.0128846tenure R2 = 0.280

n = 447,


The coefficient on tenure means that every extra year of tenure as CEO results in 0.012*100 = 1.2% increase in salary • (iv) Find the sample correlation coefficient between the variables log(sales) and profits. Are these variables highly correlated? What does this say about the OLS estimators? 3

The sample correlation between log(sales) and profits is about .582, which is fairly high. As we know, this causes no bias in the OLS estimators, although it can cause their variances to be large. Given the fairly substantial correlation between log(sales) and firm profits, it is not too surprising that the latter adds nothing to explaining CEO salaries. 4. The file Kc house data.dat contains data on This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. • (i) Confirm the partialling out interpretation of the OLS estimates by explicitly doing the partialling out. Regress log( price) on log(sq f t − living), log(sq f t − lot ) and f loors. • (ii) Regress log(sq f t − living) on log(sq f t − lot ) and f loors, and save the residual, which we can call, log(sq f t ˜− living). • (iii) Now regress log( price) on log(sq f t ˜− living). Can you confirm that the coefficient on log(sq f t ˜− living) is the same we get for log(sq f t − living) on the regression in (i). What about the standard errors in (ii), are they the same? • (iv) Run a regression that also gives you the same standard errors. (hint: you need to remove the proportion of variance coming from log(sq f t − lot ) and f loors on our dependent variable) 5. Use the Kc house data.dat again, this time we look at ommited variable bias. • (i) Run a simple regression of log(sq f t − living) on log(sq f t − above), to obtain the slope coefficient, δ˜1 . We obtain an estimate of δ˜ = 0.85966 • (ii) Run a simple regression of log( price) on log(sq f t − living), to obtain the slope coefficient, β˜1 . We obtain an estimate of β˜1 = 0.8367 4

• (iii) Run a multiple regression of log( price) on log(sq f t − living), and log(sq f t − above),to obtain the slope coefficients, βˆ1 and βˆ2 . We obtain an estimate of βˆ1 = 0.8271 and βˆ2 = 0.0110 • (iv) Verify that β˜1 = βˆ1 + βˆ2δ˜1 We can verify that 0.8367 = 0.8271 + 0.0110 × 0.8596. Note that the difference is due to rounding. 6. Exercise C 3.1 (exercise C1 on chapter 3) of Wooldridge. A problem of interest to health officials (and others) is to determine the effects of smoking during pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low can put an infant at risk for contracting various illnesses. Since factors other than cigarette smoking that affect birth weight are likely to be correlated with smoking, we should take those factors into account. For example, higher income generally results in access to better prenatal care, as well as better nutrition for the mother. An equation that recognizes this is: bwght = β0 + β1 cigs + β2 f aminc + u • (i) What is the most likely sign for β2 ? Probably β2 > 0, as more income typically means better nutrition for the mother and better prenatal care. • (ii) Do you think cigs and faminc are likely to be correlated? Explain why the correlation might be positive or negative. On the one hand, an increase in income generally increases the consumption of a good, and cigs and faminc could be positively correlated. On the other, family incomes are also higher for families with more education, and more education and cigarette smoking tend to be negatively correlated. The sample correlation between cigs and faminc is about −0.173, indicating a negative correlation. 5

• (iii) Now, estimate the equation with and without faminc, using the data in BWGHT .RAW. Report the results in equation form, including the sample size and R-squared. Discuss your results, focusing on whether adding faminc substantially changes the estimated effect of cigs on bwght. The effect of cigarette smoking is slightly smaller when faminc is added to the regression, but the difference is not great. This is due to the fact that cigs and faminc are not very correlated, and the coefficient on faminc is practically small. (The variable faminc is measured in thousands, so 10000 USD more in 1988 income increases predicted birth weight by only 0.93 ounces)


