Title | Solutions Problem Set 2 Jupyter Notebook |
---|---|
Author | Deanna C |
Course | Introduction to Econometrics |
Institution | University of California Los Angeles |
Pages | 32 |
File Size | 1.3 MB |
File Type | |
Total Downloads | 251 |
Total Views | 1,007 |
For the Chapter 2 problems use the following data sets: food4000 or food4000 cps4_small For the Chapter 3 problems, use these: br tuna Below each question is one cell for answers and one cell for code.Due: April 20, 2020Solution(a) We can use two shortcut formulas for calculating the intercept and s...
For the Chapter 2 problems use the following data sets: 1. food4000.dta or food4000.csv 2. cps4_small For the Chapter 3 problems, use these: 1. br2 2. tuna Below each question is one cell for answers and one cell for code.
Due: April 20, 2020
Question 1 ( q.2.1, page 75) - Theory Consider the following five observations. (It is useful to use an excel sheet (or a calculator) to do this exercise):
0
6
0
0
-2
1
2
1
2
-1
2
3
4
6
0
3
1
9
3
1
4
0
16
0
2
11
(a) Calculate the estimates for the intercept
and slope
(b) Using the numerical values above, show that
(c) The table below presents the fitted values
of dependent variable
estimates of item (a). Compute the unknown values in the last row of tha table below, namelly: The sum of filtted values The sum of residuals The sum of
times the residuals
(d) Show that (e) Show that
. .
0
6
5
1
1
1
2
3.7
-1.7
2.89
2
3
2.4
0.6
0.36
3
1
1.1
-0.1
0.01
4
0
-0.2
0.2
0.04
10
12
?
?
4.3
. , where
.
?
according to the
Solution (a) We can use two shortcut formulas for calculating the intercept and slope in this case:
Now for the intercept, we can use the following equation:
(b) The sampel means are given by
and
We can now compute: And it turns out that We can compute
as displayed in the table above. by:
It turns out that: as displayed in the table above. (c) Acording with the calculations below, we have that: . . . (d)
as computed in letter (b)
(e)
as computed in letter (b)
In [4]: #Calculations for letter c: # Sum of fitted values: (5+3.7+2.4+1.1-0.2) # Sum of residuals: (1-1.7+0.6-0.1+0.2) # Sum of x times the residuals: (0-1.7+1.2-0.3+0.8) 12 2.77555756156289e-17 0 7.6
Question 2 (q. 2.7, Page 78) - Theory You have the results of a simple linear regression based on state-level data and the District of Columbia, a total of observations. (a) The estimated error variance (b) The estimated variance of
What is the sum of the squared least squares residuals? is 0.00098. What is the standard error of
? What is the value of
? (c) Suppose the dependent variable is the state’s mean income (in thousands of dollars) of males who are 18 years of age or older and the percentage of males 18 years or older who are high school graduates. If , interpret this result. (d) Suppose
and
(e) Given the results in (b) and (d), what is
, what is the estimate of the intercept parameter? ?
(f) For the state of Arkansas the value of and the value of squares residual for Arkansas. (Hint: Use the information in parts (c) and (d)).
. Compute the least
Solution (a) We can obtain the sum of the squared residuals by using the formula:
(b) The estimated standard error of the estimator
(c)
that is
is given by:
means that on average, a 1% increase in the percentage of male 18 years or older who are high school
graduates increases the state’s mean income of males by approximately $180. (d) The estimate for the intercept
(e) The sum
is evaluated by:
can be evaluated in the following manner:
We note that the sample mean
is defined by
(f) The least squares residuals for Arkansas
. is
Question 3 (q. 2.8, page 78) - Theory Professor E.Z. Stuff has decided that the least squares estimator is too much trouble. Noting that two points determine a line, Dr. Stuff chooses two points from a sample of size and draws a line between them, calling the slope of this line the EZ estimator of in the simple regression model. Algebraically, if the two points are and , the EZ estimation rule is:
Assuming that all the assumptions of the simple regression model hold, namely, (SR1-SR6): (a) Show that (Must show that
is a “linear” estimator. can be expressed as
)
(b)Show that is an unbiased estimator. (Must show that )
Solution (a) The goal here is to show that the
estimator can be writen as a function that is linear in
Note that the formula of
Therefore
, that is
can be expressed as following:
is a linear estimator because it can be expressed as a linear combination of the outcomes
(b) We need to check if the expectation of that is is equal to . Recall that Thus Now using the properties of the expectation, we have that:
We have that
thus
is an unbiased estimator.
where
Question 4 - Empirical - Needs coding This exercise explores the formula for the variance of estimator and its empirical consequences. The data set food4000.dta consists of 4000 simulated data points that resemble the data set of food consumption and income examined in class, namely food.dta. The simulated data consists of 4 variables: 1. measured in $1000/week 1. 2. 3.
measured in dollars per week is an indicator that takes value 1 for the first 40 data points and zero otherwise. is an indicator that takes value 1 for the first 400 data points and zero otherwise. The simulated data is based on the following model:
Where , , for all , thereby . The explanatory variable (income) is set to be have sample mean of approximate 20 and standard deviation of approximate 6. This exercise requires you to run a regression of income on food expenditure ( ) using the first 40 points of the data, then the first 400 data points and then the full 400 data points. It simulates a 10 fold and a 100 fold increase in the sample size
.
(a) Compute the sample variance of the explanatory variable
for the first 40 data points, first 400 data points
and all 400 data points. Are these variances similar? (b) Estimate the variance of the error term (use Are these estimates similar? (c) Estimate the
) for 40, 400 and 4000 data points.
for 40, 400 and 4000 data points. Are these estimate similar?
(d) The estimated variance of (you can take the square of its standard deviation) for 40, 400 and 4000 data points. Are these estimate similar? (e) How do you relate the answer of letter (d) with the following formula:
Solution Items (a) to (g) simply ask to perform regressions in order to fill the following table:
Sample Size 40
8013.3 46.892
10.210
Sample Size 400
8396.4
43.954
10.532 0.479
Sample Size 4000
8271.1
36.034
(h) The formula for the variance of the estimator
9.971
4.382
0.057
is given by:
Note that and remail unaltered in the model as increases. Thus we should expect that the estimates for , and remain within same range as data increases from from 40 to 400 and then to 4000. However the variance of the estimator should decrease by a factor of 10 for each time that data increases by a factor of 10. This is indeed what we observed in the table above.
In [26]: samp4000...