Problem Set 3-1 - assignment PDF

Title Problem Set 3-1 - assignment
Author Katherine lee
Course Econometric Methods
Institution Boston College
Pages 3
File Size 107.7 KB
File Type PDF
Total Downloads 59
Total Views 131

Summary

assignment...


Description

Name(s)_______________________ ______________________________ ______________________________ Econometric Methods Problem Set 3 “Multiple Regression” – Due November 9th Directions: Instructions: You have until 11:59pm on November 9th to input your answers into the Canvas Quiz associated with this problem set. Each sub-problem is assigned the points indicated. This problem set counts for 25% of the total problem set grade for the course. You can work with others, but you should submit your own Quiz. Like all problem sets in this course, you can complete as a team, but please fill out your own quiz. 1. For all parts of this problem use the Wage2 BCUSE dataset. a) Run a simple regression of IQ (as dependent) on educ. Call the coefficient on ~ ~ educ δ 1 . What is: δ 1 ? b) Run a simple regression of log(wage) (as dependent) on educ. Call the coefficient ~ ~ on educ β 1 . What is: β 1 ?. c) Run a multiple regression of log(wage) on educ and IQ. Call the coefficient on educ βˇ1 and on IQ βˇ2 . What are the two coefficients βˇ1 and βˇ2 ? d) Does the equation we discussed in class on omitted variable bias actually hold in this example? e) Does what you found imply the coefficient you found in part b is biased? If so explain why (i.e., what assumption is violated and why)? 2. Consider the equation below: log ( salary ) = β0 +β 1 log ( sales ) +β 2 roe + β 3 ros + u Where roe is returns on equity and ros is return on the firm’s stock. a. In terms of the models parameters, state the null hypothesis that after controlling for sales and roe the variable ros has no impact on salary. b. Use the BCUSE dataset ceosal1. Estimate the regression above. By what percent is salary predicted to increase if ros increases by 50? (Express as a decimal rounded to three places (i.e., 50 percent is 0.500). c. Derive (or report from STATA) the t-statistic for a test that ros has no effect on log(salary) vs. the alternative that it is greater than 0. d. Would you reject the null at the 10 percent level of significance? e. After doing this test, would you continue to include ros in your regression of firm performance on CEO salary? Explain.

1

3. Use the BCUSE dataset mlb1 to estimate the equation in STATA discussed in lecture: log ( salary ) = β0 + β 1 years+ β2 gamesyr + β 3 bavg+β 4 hrunsyr + β 5 rbisyr +ui a. Use STATA to find the correlation between hrunsyr and rbisyr. b. Given this, why is it not particularly surprising neither result is statistically different than 0 when you run this regression? c. Drop the variable rbisyr. What happens to the p-value of hrunsyr? d. Why does dropping rbisyr make this kind of difference? e. While still leaving out rbisyr, wow run the regression adding the variables runsyr, fldperc, sbasesyr Are any of the variables statistically significant at the five percent level? f. Use an F-test to see if those three variables are jointly significant. i. What is the F-stat? ii. What is the critical value for 1-percent significance for this test (table in book): iii. Are the variables jointly significant? 4. The following model allows the return to education to depend on the total amount of parent’s education, called pareduc: log ( wage ) =β 0+ β1 educ + β 2 educ∗pareduc + β3 exper + β 4 tenure + u a) What is the percent increase in the wage of a one year change in educ (expressed in terms of model parameters, i.e., in symbols)? b) Do you have an expectation on the sign of β 2 ? Explain your intuition: c) Use the data in the BCUSE dataset WAGE2 to estimate the equation. You will need to generate the variable parent’s education as the sum of meduc and feduc (mother’s and father’s education respectively). Whenever you run a regression, you should know if any observations are left out of the regression (e.g., if you have 500 observations but only 440 show up in the regression). Are any of the observations present in the dataset dropped out of the regression? Can you find out why? (Hint: the tab command and the “missing” option can be useful). d) Given this regression, what is the return to an extra year of education when parent’s education is 24? 5. Working with COVID Data. This problem is designed to get you used to working with “real” data from the kind of sources your paper requires. In addition, you must also complete one of the two data tutorials. These data are from the CDC on nationwide COVID cases as of the end of September, 2020. Because the actual dataset is very large, I have taken a random sample of the data and put it into the bcuse folder. If you type “bcuse covid_09302020f”. a. Use the tab command to look at the proportions of the sample of various racial/ethnic groups represented in these COVID case data (“raceeth” is the 2

b. c.

d.

e.

f.

g.

h.

variable). Compare the proportion that is Black non-Hispanic to the share of the U.S. population (use a source like the U.S. Census) that is Black non-Hispanic. Do the same for Hispanic/Latino. Would you say COVID disproportionately affects these groups? Let’s look at the death variable. Tabulate the variable using the tab command and then the tab command with the nolabel option. How is it coded in the data? Also, many people do not have death data (either they still have COVID or could not be followed by the CDC). Can you figure out how many people have missing data? You will want to recode the death variable. Use the gen and replace commands to generate a variable called death_binary that equals 1 if the person died due to COVID and a 0 if they lived. Ignore people with missing data, although in a reallife project, you would want to try to understand what caused the missing info (for example, some states may not provide the death data to the CDC). In any case, what share of people in this data for whom we have information died? A useful command in STATA is the collapse command – it takes your dataset and collapses it down to provide some summary statistics. If, in a do file you sandwich it with the commands “preserve” and “restore” you can collapse your data and still have your original data in memory. Try typing “collapse (mean) death_binary, by(raceeth).” Within your preserve restore sandwich, export the data to excel, and make a bar chart of the death rates for people with non-missing race/ethnicity data (should be seven bars). You will upload an image of the bar chart to Canvas (easiest way is to click on the chart, print it to a PDF, and upload the PDF). One issue in the news is that the death rate is higher for Black people and Hispanic people than White, non-Hispanic people. Is this true in the data as you see them? To focus more on this issue, restrict the data to only White non-Hispanics, Black non-Hispanics, and Hispanics. Generate binary variables indicating the person is Black non-Hispanic and then one indicating they are Hispanic (a total of two variables…and note that raceeth is also a labeled number). Run a regression where you use the death_binary as the dependent variable and control for age (try including i.age in the regression to include a complete set of indicators without doing too much work) and then your two indicators (in this case, White is the “base case”). Upload the regression using a log file. Do you find that Black and Hispanic people are more significantly likely to die from COVID according to this regression? Explain what your coefficients mean. Explain the seeming contradiction between the raw result and the regression. (Hint: what percent of each group is over 70?) What does that contradiction say about the average age of people of color who get COVID relative to White people?

3...


Similar Free PDFs