Solutions Problem Set 2 Jupyter Notebook PDF

Title	Solutions Problem Set 2 Jupyter Notebook
Author	Deanna C
Course	Introduction to Econometrics
Institution	University of California Los Angeles
Pages	32
File Size	1.3 MB
File Type	PDF
Total Downloads	251
Total Views	1,007

Preview

CLICK TO PREVIEW PDF

Summary

For the Chapter 2 problems use the following data sets: food4000 or food4000 cps4_small For the Chapter 3 problems, use these: br tuna Below each question is one cell for answers and one cell for code.Due: April 20, 2020Solution(a) We can use two shortcut formulas for calculating the intercept and s...

Description

For the Chapter 2 problems use the following data sets: 1. food4000.dta or food4000.csv 2. cps4_small For the Chapter 3 problems, use these: 1. br2 2. tuna Below each question is one cell for answers and one cell for code.

Due: April 20, 2020

Question 1 ( q.2.1, page 75) - Theory Consider the following five observations. (It is useful to use an excel sheet (or a calculator) to do this exercise):

0

6

0

0

-2

1

2

1

2

-1

2

3

4

6

0

3

1

9

3

1

4

0

16

0

2

11

(a) Calculate the estimates for the intercept

and slope

(b) Using the numerical values above, show that

(c) The table below presents the fitted values

of dependent variable

estimates of item (a). Compute the unknown values in the last row of tha table below, namelly: The sum of filtted values The sum of residuals The sum of

times the residuals

(d) Show that (e) Show that

. .

0

6

5

1

1

1

2

3.7

-1.7

2.89

2

3

2.4

0.6

0.36

3

1

1.1

-0.1

0.01

4

0

-0.2

0.2

0.04

10

12

?

?

4.3

. , where

.

?

according to the

Solution (a) We can use two shortcut formulas for calculating the intercept and slope in this case:

Now for the intercept, we can use the following equation:

(b) The sampel means are given by

and

We can now compute: And it turns out that We can compute

as displayed in the table above. by:

It turns out that: as displayed in the table above. (c) Acording with the calculations below, we have that: . . . (d)

as computed in letter (b)

(e)

as computed in letter (b)

In [4]: #Calculations for letter c: # Sum of fitted values: (5+3.7+2.4+1.1-0.2) # Sum of residuals: (1-1.7+0.6-0.1+0.2) # Sum of x times the residuals: (0-1.7+1.2-0.3+0.8) 12 2.77555756156289e-17 0 7.6

Question 2 (q. 2.7, Page 78) - Theory You have the results of a simple linear regression based on state-level data and the District of Columbia, a total of observations. (a) The estimated error variance (b) The estimated variance of

What is the sum of the squared least squares residuals? is 0.00098. What is the standard error of

? What is the value of

? (c) Suppose the dependent variable is the state’s mean income (in thousands of dollars) of males who are 18 years of age or older and the percentage of males 18 years or older who are high school graduates. If , interpret this result. (d) Suppose

and

(e) Given the results in (b) and (d), what is

, what is the estimate of the intercept parameter? ?

(f) For the state of Arkansas the value of and the value of squares residual for Arkansas. (Hint: Use the information in parts (c) and (d)).

. Compute the least

Solution (a) We can obtain the sum of the squared residuals by using the formula:

(b) The estimated standard error of the estimator

(c)

that is

is given by:

means that on average, a 1% increase in the percentage of male 18 years or older who are high school

graduates increases the state’s mean income of males by approximately $180. (d) The estimate for the intercept

(e) The sum

is evaluated by:

can be evaluated in the following manner:

We note that the sample mean

is defined by

(f) The least squares residuals for Arkansas

. is

Question 3 (q. 2.8, page 78) - Theory Professor E.Z. Stuff has decided that the least squares estimator is too much trouble. Noting that two points determine a line, Dr. Stuff chooses two points from a sample of size and draws a line between them, calling the slope of this line the EZ estimator of in the simple regression model. Algebraically, if the two points are and , the EZ estimation rule is:

Assuming that all the assumptions of the simple regression model hold, namely, (SR1-SR6): (a) Show that (Must show that

is a “linear” estimator. can be expressed as

)

(b)Show that is an unbiased estimator. (Must show that )

Solution (a) The goal here is to show that the

estimator can be writen as a function that is linear in

Note that the formula of

Therefore

, that is

can be expressed as following:

is a linear estimator because it can be expressed as a linear combination of the outcomes

(b) We need to check if the expectation of that is is equal to . Recall that Thus Now using the properties of the expectation, we have that:

We have that

thus

is an unbiased estimator.

where

Question 4 - Empirical - Needs coding This exercise explores the formula for the variance of estimator and its empirical consequences. The data set food4000.dta consists of 4000 simulated data points that resemble the data set of food consumption and income examined in class, namely food.dta. The simulated data consists of 4 variables: 1. measured in $1000/week 1. 2. 3.

measured in dollars per week is an indicator that takes value 1 for the first 40 data points and zero otherwise. is an indicator that takes value 1 for the first 400 data points and zero otherwise. The simulated data is based on the following model:

Where , , for all , thereby . The explanatory variable (income) is set to be have sample mean of approximate 20 and standard deviation of approximate 6. This exercise requires you to run a regression of income on food expenditure ( ) using the first 40 points of the data, then the first 400 data points and then the full 400 data points. It simulates a 10 fold and a 100 fold increase in the sample size

.

(a) Compute the sample variance of the explanatory variable

for the first 40 data points, first 400 data points

and all 400 data points. Are these variances similar? (b) Estimate the variance of the error term (use Are these estimates similar? (c) Estimate the

) for 40, 400 and 4000 data points.

for 40, 400 and 4000 data points. Are these estimate similar?

(d) The estimated variance of (you can take the square of its standard deviation) for 40, 400 and 4000 data points. Are these estimate similar? (e) How do you relate the answer of letter (d) with the following formula:

Solution Items (a) to (g) simply ask to perform regressions in order to fill the following table:

Sample Size 40

8013.3 46.892

10.210

Sample Size 400

8396.4

43.954

10.532 0.479

Sample Size 4000

8271.1

36.034

(h) The formula for the variance of the estimator

9.971

4.382

0.057

is given by:

Note that and remail unaltered in the model as increases. Thus we should expect that the estimates for , and remain within same range as data increases from from 40 to 400 and then to 4000. However the variance of the estimator should decrease by a factor of 10 for each time that data increases by a factor of 10. This is indeed what we observed in the table above.

In [26]: samp4000...