PS3-2019 F-key - nnnnnnnnnnnnnnnnnnn PDF

Title PS3-2019 F-key - nnnnnnnnnnnnnnnnnnn
Author yutong dong
Course Intro To Econometric Methods
Institution Michigan State University
Pages 5
File Size 209.7 KB
File Type PDF
Total Downloads 15
Total Views 134

Summary

nnnnnnnnnnnnnnnnnnn...


Description

Economics 420 (sections 2 and 3) Professor Woodbury Fall Semester 2019 Problem Set #3 (Due: Thursday, October 17) Instructions • Use Stata and a word processor for this assignment. • Read the question and answer what is asked. • For each question that asks you to use Stata, copy and paste the Stata output into a wordprocessing document, then type your answer. Staple all pages together at the upper-left corner before you turn in your homework. • Assignments turned in unstapled will be returned with a grade of zero. • Only stapling is acceptable—paper clips and other methods of binding are not acceptable. • If we cannot discern the meaning of your work, your response will be scored as incorrect. This problem set introduces you to multiple regression in Stata, hypotheses tests about regression parameters, and omitted variable bias. The problem set uses the Stata dataset HTV.dta that can be downloaded from the D2L website. This dataset includes data on 1,230 men aged 26–34 who were interviewed in 1991 for the National Longitudinal Survey of Youth. You will be using the the following variables: lwage natural logarithm of the person’s hourly wage rate in 1991 educ highest grade level completed by 1991 abil score from an ability test administered in 1979 ne = 1 if the person lived in one of the northeastern states Part 1: Multiple regression basics (30 points total) 1. (5 points) What are the mean and standard deviation of lwage in the data? What are the mean and standard deviation of abil in the data? lwage = 2.414 (0.594) abil = 1.797 (2.184)

2. (5 points) Suppose the correct population model for the log of the wage rate (lwage) is given by: lwage = β0 + β1 educ + β2 abil + u Estimate the above model and paste in your Stata output. Hint: type reg lwage educ abil. Answer follows on next page:

. reg lwage educ abil Source | SS df MS -------------+---------------------------------Model | 80.8656331 2 40.4328165 Residual | 352.353629 1,227 .287166772 -------------+---------------------------------Total | 433.219262 1,229 .352497365

Number of obs F(2, 1227) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

1,230 140.80 0.0000 0.1867 0.1853 .53588

-----------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .0718563 .008071 8.90 0.000 .0560218 .0876908 abil | .053533 .0086989 6.15 0.000 .0364667 .0705994 _cons | 1.380811 .0979602 14.10 0.000 1.188623 1.572999 ------------------------------------------------------------------------------

3. (5 points) What is the R2 of the regression and how do you interpret it? What does it tell you about the extent to which wages are causally affected by education? Answer: The R2 of the regression is 0.187, meaning about 19 percent of the total variation in log-wages is explained by variation in and education and residence in the northeast. R2 always increases when a relevant variable is added to the equation.

4. (5 points) What is the OLS estimate of β1 from this regression and how do you interpret it? Answer: The estimated coefficient on education is 0.072, which implies that one more year of education is related to log-wages that are higher by 0.072, on average (or wages that are higher by about 7.2 percent), holding IQ score constant.

5. (10 points) Use the output from your regression to test the hypothesis that education is unrelated to log-earnings. State the null and the alternative hypotheses in terms of the notation used in class, then state the test statistic you use. Using a significance level of 5%, what do you conclude and why? Be sure to explain your reasoning. Answer: Step 1: H0: β1 = 0; HA: β1 ≠0 Step 2: Estimate of β1 = 0.0719, SE(b1) = 0.0081 (see the Stata output). Step 3: t-statistic = 0.0719/0.0081 = 8.90 (see the Stata output). Step 4: The p-value associated with a t-statistic of 8.90 is < 0.0005. Step 5: The p-value < 0.05 (the specified significance level), so reject the null hypothesis that education is unrelated to log-wage at the 5% level.

Part 2: Omitted variable bias in estimating the return to education (35 points total) Suppose you cannot observe a person’s ability, so you estimate the following simple linear regression model: lwage = β0 + β1 educ + v where ability ends up in the error term u.

1. (5 points) What is the OLS estimate of the slope coefficient on education from the above simple regression? (Paste in your Stata output.) How do you interpret the coefficient? Answer: See the Stata output below. The estimated coefficient on education is 0.101, which implies that one more year of education is related to log-wages that are higher by nearly 0.10, on average. Because this is a log-level specification, this should be interpreted in percentage terms: one more year of education is related to wages that are higher by about 10 percent, on average. . reg lwage educ Source | SS df MS -------------+---------------------------------Model | 69.9901106 1 69.9901106 Residual | 363.229151 1,228 .295789211 -------------+---------------------------------Total | 433.219262 1,229 .352497365

Number of obs F(1, 1228) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

1,230 236.62 0.0000 0.1616 0.1609 .54387

-----------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .1013613 .0065894 15.38 0.000 .0884336 .114289 _cons | 1.092319 .0872969 12.51 0.000 .9210517 1.263587 ------------------------------------------------------------------------------

2. (5 points) Write down the OVB formula. Answer:

β!1 = βˆ1 + βˆ 2δ!1

3. (10 points) Use the formula you just wrote down to explain the two conditions that need to be satisfied in order for omitting ability from the lwage regression to result in a biased estimator of β1 (the return to education)? Do you think these conditions hold? Why or why not? Answer: For omitted variable bias to occur, the omitted variable (in this case, ability) must be a determinant of the outcome (in this case, the wage) and must be correlated with the “treatment” (in this case, years of education). It seems highly likely that both conditions hold: It stands to reason that “ability” of various kinds affects wages, and it also makes sense that people who have more ability are likely to obtain more education on average.

4. (5 points) In light of your answer to the last question, why does it makes sense that the estimated slope coefficient on education in the simple regression above is larger than the estimate from the the multiple regression of lwage on educ and abil? Answer: Based on the OVB formula, you would expect the OLS estimator of β1 in the simple regression to be positively (upward) biased for two reasons. First, the omitted variable (abil) is positively related to the outcome (lwage). Second, the omitted variable (abil) is positively correlated with the key independent variable (educ). In terms of the 2x2 OVB matrix in Wooldridge, β2 > 0 and corr(X1, X2) > 0, so the bias from omitting ability will be positive.

5. (5 points) Estimate the regression of abil on educ and paste in your Stata output. Interpret coefficient on educ in the regression. . reg abil educ Source | SS df MS -------------+---------------------------------Model | 2069.38018 1 2069.38018 Residual | 3794.9547 1,228 3.09035399 -------------+---------------------------------Total | 5864.33488 1,229 4.77163131

Number of obs F(1, 1228) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

1,230 669.63 0.0000 0.3529 0.3523 1.7579

-----------------------------------------------------------------------------abil | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .5511552 .0212989 25.88 0.000 .5093689 .5929415 _cons | -5.389034 .2821704 -19.10 0.000 -5.942624 -4.835445 ------------------------------------------------------------------------------

Answer: See the output above. The OLS estimate of β1 is 0.551, so one more year of education is related to measured ability that is greater by 0.551 point, on average.

6. (5 points) Again write down the OVB formula, and plug in the estimates from the equations you have estimated to show that it “works.” Is the estimated bias from the OVB equation positive or negative? Is it consistent with your answer to question 4 above? Answer: β! = βˆ + βˆ δ! 1

1

2 1

From the regressions, we have: β1-tilde = 0.101 β1-hat = 0.072 β2-hat = 0.054 δ1-tilde = 0.551 Plugging these estimates into the OVB formula gives: 0.101 = 0.072 + (0.054 • 0.551) 0.101 = 0.072 + 0.030 which confirms that the bias in the model with ability omitted is about 0.030 — we overestimate the return to schooling by about 3 percent if we omit ability from the model.

Extra Credit (5 points) Are lwage and abil approximately normally distributed? How do you know? Hint: type sum lwage abil, detail, then then check to see whether approximately 90% of the lwage observations lie within 1.64 standard deviations of the mean lwage. Do the same for abil. Answer: The mean lwage = 2.414 (std. dev. = 0.594), so we expect 90% of the wage distribution to lie between 1.44 (= 2.414 – 1.64•0.594) and 3.39 (= 2.414 + 1.64•0.594). According to the Stata output, the 5th lwage percentile is 1.45, and the 95th lwage percentile is 3.26, so the lwage distribution appears to be close to normal. The mean abil = 1.797 (std. dev = 2.184), so we expect 90% of the ability distribution to lie between – 1.785 (= 1.797 – 1.64•2.184) and 5.379 (= 1.797 + 1.64•2.184). According to the Stata output, the 5th ability is –2.49, and the 95th ability percentiles is 4.70, so I would be reluctant to conclude that the lwage distribution is normal.

The test for normality I suggested in the hint is too crude to be very useful. SkewnessKurtosis tests are widely used to test for normality and are programmed in Stata: . sktest lwage abil Skewness/Kurtosis tests for Normality ------ joint -----Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2 -------------+--------------------------------------------------------------lwage | 1,230 0.0000 0.0000 51.48 0.0000 abil | 1,230 0.0000 0.3330 72.09 0.0000

The Skewness-Kurtosis tests reject normality for both distributions: Pr(Skewness) < 0.05 implies rejection of normality based on the presence of skewness, Pr(Kurtosis) < 0.05 implies rejection of normality based on kurtosis, and (Prob>chi2) < 0.05 implies rejection of normality based on a joint test of skewness and kurtosis. To my eye, the lwage distribution doesn’t look skewed (as the skewness test suggests), but kurtosis is pretty clear (see the histogram). The ability distribution is clearly left-skewed, and although it doesn’t have fat tails, it is clearly non-normal (see the histogram).

0

.2

Density .4

.6

.8

. hist lwage (bin=30, start=.02325686, width=.14969983)

0

1

2

3

4

5

log(wage)

0

.05

Density .1

.15

.2

. hist abil (bin=30, start=-5.6314631, width=.39650685)

-5

0 abil. measure, not standardized

5...


Similar Free PDFs