PBS2 - Problem Set 2 PDF

Title	PBS2 - Problem Set 2
Author	Sasuke Sarutobi
Course	Principles Of Statistics I
Institution	University of Nevada, Las Vegas
Pages	8
File Size	181.8 KB
File Type	PDF
Total Downloads	48
Total Views	158

Preview

CLICK TO PREVIEW PDF

Summary

Problem Set 2...

Description

Alvin Lo Econ 262 Munpyung O Problem Set 2 1. We want to estimate the effect of IQ and education on income. However, our old computer can only run simple linear regressions. What are the steps to estimate the partial effect of IQ on income? Write the procedure step by step and explain. Be sure to include precise Stata commands for each step. To find the partial effect of IQ on income, we have to separate the regression into different steps. First is a simple regression of IQ on Education with command reg IQ educ on Stata. Then, we have to find rhat, where IQ has no perfect correlation to education with command predict rhat, residual. This allows us to then run a simple regression of reg income rhat. Alpha 1 will be b1^hat for the regression IQ and education on income. 2. We consider the celebrated Cobb-Douglas production function, which may expressed as: Yi = β0 Lβ1i Kβ2i (1)where Y = output in thousand dollars, L = labor input in thousand hours, K = capital in thousand dollars, β0 is a constant. This model is nonlinear in parameters and to estimate it as it stands requires nonlinear estimation techniques. However if we take the logarithm of this function and adding the error term εi, we obtain the following linear regression model: lnYi = α0 +β1 lnLi +β2 lnKi +εi, where α0 = lnβ0 (2) a) Use the CobbDouglas.dta file, estimate αˆ 0,ˆβ1,ˆβ2. Show your Stata regression output table. Write the estimated equation. . use "D:\CobbDouglas.dta", clear . . reg lnoutput lnlabor lncapital Source

SS

df

MS

Model Residual

91.9246133 3.41551772

2 48

45.9623067 .071156619

Total

95.340131

50

1.90680262

lnoutput

Coef.

lnlabor lncapital _cons

.4683318 .5212795 3.887599

Number of obs F(2, 48) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

51 645.93 0.0000 0.9642 0.9627 .26675

Std. Err.

t

P>|t|

[95% Conf. Interval]

.0989259 .096887 .3962281

4.73 5.38 9.81

0.000 0.000 0.000

.269428 .326475 3.090929

.6672357 .7160839 4.684269

Lnoutput(Yi)=3.887599+.4683318lnLabor(Li)+.5212795lncapital(Ki) b) Interpret and explain the regression coefficients. Since B1 would be the slope, if lnlabor(Li) increases by one percent (ceteris paribus), lnoutput(Yi) will increase by 0.4683318%. B2 displays that lncapital(Ki) if increased by one percent will increase output (lnYi) will change by .5212795%. B0 is the intercept, and at zero, output lnYi will start at 3.887599 percent.

c) Check whether the regression coefficients, αˆ 0,ˆβ1,ˆβ2 are statistically significant at the confidence level 99%. . count 51 . di invttail(49,.005) 2.679952 . di .4683318/.0989259 4.7341677 . di .5212795/.096887 5.3802832 . di 3.887599/.3962281 9.8115177

αˆ 0,ˆβ1,ˆβ2 are statistically significant at the confidence level 99% as they are within the rejection region of our null hypothesis.

d) Check overall statistical significance of this regression at 5% significance level. . di invFtail(3, 48, .05) 2.7980606

The F-statistic is large and greater than 2.798, so the statistical significance should be pretty high. e) Test whether this production function is CRS (Constant return to scale), i.e., test the linear restriction, β1 +β2 = 1. . test lnlabor+lncapital=1 ( 1)

lnlabor + lncapital = 1 F(

1, 48) = Prob > F =

0.14 0.7085

f) Use your estimates and recover equation (1) 3. Problem 5 in chapter 3 (i) No, because by definition, study + sleep + work + leisure = 168. If we were to change study, we would have to change at least one of the other categories so that they would still equal 168. (ii) Because if we wrote that it all adds up to 168, you could say study is a linear function of the other independent variables. So, study would equal 168 − sleep − work − leisure. That would violate the perfect collinearity assumption for MLR3 for study. (iii) In class, you mentioned we could just drop one of the independent variables. So, we could say leisure is interpreted as the change in GPA when study increases by one hour, with ceteris paribus and everything else remaining fixed. Therefore if sleep and work are fixed but study increases by one hour, then leisure decreases by an hour, which you could do for the other variables.

4. Computer exercise 5 in chapter 3 The regression of educ on exper and tenure should be the equation . use "C:\Users\Alvin\Downloads\64-bit\Stata\WAGE1.DTA", clear . reg educ exper tenure Source

SS

df

MS

Model Residual

407.946311 3617.48335

2 523

203.973156 6.91679416

Total

4025.42966

525

7.66748506

educ

Coef.

exper tenure _cons

-.0737851 .0476795 13.57496

Std. Err. .0097609 .0183371 .1843245

t

Number of obs F(2, 523) Prob > F R-squared Adj R-squared Root MSE

P>|t|

-7.56 2.60 73.65

0.000 0.010 0.000

= = = = = =

526 29.49 0.0000 0.1013 0.0979 2.63

[95% Conf. Interval] -.0929604 .011656 13.21286

-.0546098 .0837031 13.93707

educ = 13.57 − .074 exper + .048 tenure + rhat1 n = 526, R2 = .101. regress log(wage) on rhat we obtain . predict rhat, residual .

reg lwage rhat Source

SS

df

MS

Model Residual

.517262044 147.812489

1 524

.517262044 .282084903

Total

148.329751

525

.28253286

lwage

Coef.

rhat _cons

-.0026693 1.623268

Std. Err. .0019712 .0231578

t -1.35 70.10

Number of obs F(1, 524) Prob > F R-squared Adj R-squared Root MSE

P>|t| 0.176 0.000

= = = = = =

526 1.83 0.1763 0.0035 0.0016 .53112

[95% Conf. Interval] -.0065416 1.577775

.0012031 1.668762

The regression on rhat only uses the part of educ that is uncorrelated with exper and tenure to explain log(wage). 5. Computer exercise 8 in chapter 3 . use "C:\Users\Alvin\Downloads\64-bit\Stata\discrim.dta", clear . summ prpblck income Variable

Obs

Mean

prpblck income

409 409

.1134864 47053.78

Std. Dev. .1824165 13179.29

Min

Max

0 15919

.9816579 136529

. reg psoda prpblck logincome Source

SS

df

MS

Model Residual

.209045947 2.9449712

2 398

.104522973 .007399425

Total

3.15401715

400

.007885043

psoda

Coef.

prpblck logincome _cons

.1258267 .0788228 .1855321

Std. Err.

t

.0269747 .0173891 .1879993

Number of obs F(2, 398) Prob > F R-squared Adj R-squared Root MSE

P>|t|

4.66 4.53 0.99

= = = = = =

401 14.13 0.0000 0.0663 0.0616 .08602

[95% Conf. Interval]

0.000 0.000 0.324

.072796 .0446367 -.1840636

.1788575 .1130089 .5551278

psoda=.1855+.1258prpblck+.0788income Prpblck is the population percentage of black people who live in a zipcode. The regression appears to be comparing the effect a black population has on the price of soda, which apparently increases price by . 1258. (iii) . reg psoda prpblck Source

SS

df

MS

Model Residual

.057010466 3.09700668

1 399

.057010466 .007761922

Total

3.15401715

400

.007885043

psoda

Coef.

prpblck _cons

.0649269 1.037399

Std. Err. .023957 .0051905

t 2.71 199.87

Number of obs F(1, 399) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

401 7.34 0.0070 0.0181 0.0156 .0881

P>|t|

[95% Conf. Interval]

0.007 0.000

.0178292 1.027195

.1120245 1.047603

Discrimination effect appears to be smaller when controlled for income. (iv) . reg lpsoda prpblck logincome Source

SS

df

Model Residual

.196020672 2.68272938

2 398

.098010336 .006740526

Total

2.87875005

400

.007196875

lpsoda

Coef.

prpblck logincome _cons

.1215803 .0765114 -.793768

Std. Err. .0257457 .0165969 .1794337

. di 20*.1215803 2.431606

Price of soda changes by 2.43%. (v)

MS

t 4.72 4.61 -4.42

Number of obs F(2, 398) Prob > F R-squared Adj R-squared Root MSE

P>|t| 0.000 0.000 0.000

= = = = = =

401 14.54 0.0000 0.0681 0.0634 .0821

[95% Conf. Interval] .0709657 .0438829 -1.146524

.1721948 .1091399 -.4410117

. reg lpsoda prpblck lincome prppov Source

SS

df

MS

Model Residual

.250340622 2.62840943

3 397

.083446874 .006620679

Total

2.87875005

400

.007196875

lpsoda

Coef.

prpblck lincome prppov _cons

.0728072 .1369553 .38036 -1.463333

Std. Err. .0306756 .0267554 .1327903 .2937111

t 2.37 5.12 2.86 -4.98

Number of obs F(3, 397) Prob > F R-squared Adj R-squared Root MSE

P>|t| 0.018 0.000 0.004 0.000

= = = = = =

401 12.60 0.0000 0.0870 0.0801 .08137

[95% Conf. Interval] .0125003 .0843552 .1192999 -2.040756

.1331141 .1895553 .6414201 -.8859092

Adding prppov decreased the coefficient of prpblack to .0728072. This appears to correlate the price of soda with the amount of people in poverty and perhaps not the black population.

(vi) . corr lincome prppov (obs=409)

lincome prppov

lincome

prppov

1.0000 -0.8385

1.0000

Yup. An increase in income would cause the population portion with poverty to decrease, which is consistent with its extremely high negative correlation. (vii) I don’t think you can remove a variable just because it’s highly correlated, as this would create bias in your equation. There is also a chance that these variables combined together have an effect on the regression which means you can’t take them out.

6. Problem 3 in chapter 4 . di .321/100 .00321 . di .00321*10 .0321

Increasing sales by one percent causes rdintens will change by .00321 units. Increasing sales by 10 percent will increase it to .0321. This is only .03 of a percent, which is not great when sales increased by 10%.

(ii) . di invttail(29,.05) 1.699127 . di invttail(29,.10) 1.3114336 . di .321/.216 1.4861111

We reject the null hypothesis when B1>0 at the 10% significance. 1.48 F R-squared Adj R-squared Root MSE

P>|t| 0.000 0.000 0.184 0.000 0.000

= = = = = =

353 145.24 0.0000 0.6254 0.6211 .72788

[95% Conf. Interval] .0439089 .0126841 -.0006776 .0217021 10.49829

.091556 .0188348 .0035147 .0501847 11.54353

. reg lsalary years gamesyr bavg hrunsyr rbisyr Source

SS

df

MS

Model Residual

308.989208 183.186327

5 347

61.7978416 .527914487

Total

492.175535

352

1.39822595

lsalary

Coef.

years gamesyr bavg hrunsyr rbisyr _cons

.0688626 .0125521 .0009786 .0144295 .0107657 11.19242

Std. Err. .0121145 .0026468 .0011035 .016057 .007175 .2888229

t 5.68 4.74 0.89 0.90 1.50 38.75

Number of obs F(5, 347) Prob > F R-squared Adj R-squared Root MSE

P>|t| 0.000 0.000 0.376 0.369 0.134 0.000

= = = = = =

353 117.06 0.0000 0.6278 0.6224 .72658

[95% Conf. Interval] .0450355 .0073464 -.0011918 -.0171518 -.0033462 10.62435

.0926898 .0177578 .003149 .0460107 .0248776 11.76048

. di .0359434/.0072408 4.9640095 By dropping rbisyr, hrunsyr is more statistically significant at 4.9640 while the coefficient increased from .0144 to about .0359. (ii) . reg lsalary years gamesyr bavg hrunsyr runsyr fldperc sbasesyr Source

SS

df

MS

Model Residual

314.510478 177.665058

7 345

44.9300682 .514971181

Total

492.175535

352

1.39822595

lsalary

Coef.

years gamesyr bavg hrunsyr runsyr fldperc sbasesyr _cons

.0699848 .0078995 .0005296 .0232106 .0173922 .0010351 -.0064191 10.40827

Std. Err. .0119756 .0026775 .0011038 .0086392 .0050641 .0020046 .0051842 2.003255

t

Number of obs F(7, 345) Prob > F R-squared Adj R-squared Root MSE

P>|t|

5.84 2.95 0.48 2.69 3.43 0.52 -1.24 5.20

0.000 0.003 0.632 0.008 0.001 0.606 0.216 0.000

= = = = = =

353 87.25 0.0000 0.6390 0.6317 .71761

[95% Conf. Interval] .0464305 .0026333 -.0016414 .0062185 .0074318 -.0029077 -.0166157 6.468139

.0935391 .0131657 .0027007 .0402027 .0273525 .0049778 .0037775 14.3484

. di .0173922/.0050641 3.4344109 . di .0010351/.0020046 .51636237 . di -.0064191/.0051842 -1.2382045

Hrunsyr is the only statistically significant variable with a t-statistic of 3.43. Salary will increase by 1.7% with every home run. Fielding and stolen base both have low t-statistics and are therefore not really individually significant. (iii) . reg lsalary bavg fldperc sbasesyr Source

SS

df

MS

Model Residual

94.0540573 398.121478

3 349

31.3513524 1.14074922

Total

492.175535

352

1.39822595

lsalary

Coef.

bavg fldperc sbasesyr _cons

.007144 .0076477 .0323301 3.899057

Std. Err. .0015348 .0029213 .0052818 2.931406

t 4.65 2.62 6.12 1.33

Number of obs F(3, 349) Prob > F R-squared Adj R-squared Root MSE

P>|t| 0.000 0.009 0.000 0.184

= = = = = =

353 27.48 0.0000 0.1911 0.1841 1.0681

[95% Conf. Interval] .0041254 .0019022 .0219418 -1.866388

.0101625 .0133933 .0427183 9.664502

It doesn’t appear any of the t-statistics are insignificant. Batting average, field percentage, stolen bases at Bavg=.4798 fldperc=.5164 sbasesyr=-1.2382 all appear to have an effect on salary, which makes sense....