Title | PBS2 - Problem Set 2 |
---|---|
Author | Sasuke Sarutobi |
Course | Principles Of Statistics I |
Institution | University of Nevada, Las Vegas |
Pages | 8 |
File Size | 181.8 KB |
File Type | |
Total Downloads | 48 |
Total Views | 158 |
Problem Set 2...
Alvin Lo Econ 262 Munpyung O Problem Set 2 1. We want to estimate the effect of IQ and education on income. However, our old computer can only run simple linear regressions. What are the steps to estimate the partial effect of IQ on income? Write the procedure step by step and explain. Be sure to include precise Stata commands for each step. To find the partial effect of IQ on income, we have to separate the regression into different steps. First is a simple regression of IQ on Education with command reg IQ educ on Stata. Then, we have to find rhat, where IQ has no perfect correlation to education with command predict rhat, residual. This allows us to then run a simple regression of reg income rhat. Alpha 1 will be b1^hat for the regression IQ and education on income. 2. We consider the celebrated Cobb-Douglas production function, which may expressed as: Yi = β0 Lβ1i Kβ2i (1)where Y = output in thousand dollars, L = labor input in thousand hours, K = capital in thousand dollars, β0 is a constant. This model is nonlinear in parameters and to estimate it as it stands requires nonlinear estimation techniques. However if we take the logarithm of this function and adding the error term εi, we obtain the following linear regression model: lnYi = α0 +β1 lnLi +β2 lnKi +εi, where α0 = lnβ0 (2) a) Use the CobbDouglas.dta file, estimate αˆ 0,ˆβ1,ˆβ2. Show your Stata regression output table. Write the estimated equation. . use "D:\CobbDouglas.dta", clear . . reg lnoutput lnlabor lncapital Source
SS
df
MS
Model Residual
91.9246133 3.41551772
2 48
45.9623067 .071156619
Total
95.340131
50
1.90680262
lnoutput
Coef.
lnlabor lncapital _cons
.4683318 .5212795 3.887599
Number of obs F(2, 48) Prob > F R-squared Adj R-squared Root MSE
= = = = = =
51 645.93 0.0000 0.9642 0.9627 .26675
Std. Err.
t
P>|t|
[95% Conf. Interval]
.0989259 .096887 .3962281
4.73 5.38 9.81
0.000 0.000 0.000
.269428 .326475 3.090929
.6672357 .7160839 4.684269
Lnoutput(Yi)=3.887599+.4683318lnLabor(Li)+.5212795lncapital(Ki) b) Interpret and explain the regression coefficients. Since B1 would be the slope, if lnlabor(Li) increases by one percent (ceteris paribus), lnoutput(Yi) will increase by 0.4683318%. B2 displays that lncapital(Ki) if increased by one percent will increase output (lnYi) will change by .5212795%. B0 is the intercept, and at zero, output lnYi will start at 3.887599 percent.
c) Check whether the regression coefficients, αˆ 0,ˆβ1,ˆβ2 are statistically significant at the confidence level 99%. . count 51 . di invttail(49,.005) 2.679952 . di .4683318/.0989259 4.7341677 . di .5212795/.096887 5.3802832 . di 3.887599/.3962281 9.8115177
αˆ 0,ˆβ1,ˆβ2 are statistically significant at the confidence level 99% as they are within the rejection region of our null hypothesis.
d) Check overall statistical significance of this regression at 5% significance level. . di invFtail(3, 48, .05) 2.7980606
The F-statistic is large and greater than 2.798, so the statistical significance should be pretty high. e) Test whether this production function is CRS (Constant return to scale), i.e., test the linear restriction, β1 +β2 = 1. . test lnlabor+lncapital=1 ( 1)
lnlabor + lncapital = 1 F(
1, 48) = Prob > F =
0.14 0.7085
f) Use your estimates and recover equation (1) 3. Problem 5 in chapter 3 (i) No, because by definition, study + sleep + work + leisure = 168. If we were to change study, we would have to change at least one of the other categories so that they would still equal 168. (ii) Because if we wrote that it all adds up to 168, you could say study is a linear function of the other independent variables. So, study would equal 168 − sleep − work − leisure. That would violate the perfect collinearity assumption for MLR3 for study. (iii) In class, you mentioned we could just drop one of the independent variables. So, we could say leisure is interpreted as the change in GPA when study increases by one hour, with ceteris paribus and everything else remaining fixed. Therefore if sleep and work are fixed but study increases by one hour, then leisure decreases by an hour, which you could do for the other variables.
4. Computer exercise 5 in chapter 3 The regression of educ on exper and tenure should be the equation . use "C:\Users\Alvin\Downloads\64-bit\Stata\WAGE1.DTA", clear . reg educ exper tenure Source
SS
df
MS
Model Residual
407.946311 3617.48335
2 523
203.973156 6.91679416
Total
4025.42966
525
7.66748506
educ
Coef.
exper tenure _cons
-.0737851 .0476795 13.57496
Std. Err. .0097609 .0183371 .1843245
t
Number of obs F(2, 523) Prob > F R-squared Adj R-squared Root MSE
P>|t|
-7.56 2.60 73.65
0.000 0.010 0.000
= = = = = =
526 29.49 0.0000 0.1013 0.0979 2.63
[95% Conf. Interval] -.0929604 .011656 13.21286
-.0546098 .0837031 13.93707
educ = 13.57 − .074 exper + .048 tenure + rhat1 n = 526, R2 = .101. regress log(wage) on rhat we obtain . predict rhat, residual .
reg lwage rhat Source
SS
df
MS
Model Residual
.517262044 147.812489
1 524
.517262044 .282084903
Total
148.329751
525
.28253286
lwage
Coef.
rhat _cons
-.0026693 1.623268
Std. Err. .0019712 .0231578
t -1.35 70.10
Number of obs F(1, 524) Prob > F R-squared Adj R-squared Root MSE
P>|t| 0.176 0.000
= = = = = =
526 1.83 0.1763 0.0035 0.0016 .53112
[95% Conf. Interval] -.0065416 1.577775
.0012031 1.668762
The regression on rhat only uses the part of educ that is uncorrelated with exper and tenure to explain log(wage). 5. Computer exercise 8 in chapter 3 . use "C:\Users\Alvin\Downloads\64-bit\Stata\discrim.dta", clear . summ prpblck income Variable
Obs
Mean
prpblck income
409 409
.1134864 47053.78
Std. Dev. .1824165 13179.29
Min
Max
0 15919
.9816579 136529
. reg psoda prpblck logincome Source
SS
df
MS
Model Residual
.209045947 2.9449712
2 398
.104522973 .007399425
Total
3.15401715
400
.007885043
psoda
Coef.
prpblck logincome _cons
.1258267 .0788228 .1855321
Std. Err.
t
.0269747 .0173891 .1879993
Number of obs F(2, 398) Prob > F R-squared Adj R-squared Root MSE
P>|t|
4.66 4.53 0.99
= = = = = =
401 14.13 0.0000 0.0663 0.0616 .08602
[95% Conf. Interval]
0.000 0.000 0.324
.072796 .0446367 -.1840636
.1788575 .1130089 .5551278
psoda=.1855+.1258prpblck+.0788income Prpblck is the population percentage of black people who live in a zipcode. The regression appears to be comparing the effect a black population has on the price of soda, which apparently increases price by . 1258. (iii) . reg psoda prpblck Source
SS
df
MS
Model Residual
.057010466 3.09700668
1 399
.057010466 .007761922
Total
3.15401715
400
.007885043
psoda
Coef.
prpblck _cons
.0649269 1.037399
Std. Err. .023957 .0051905
t 2.71 199.87
Number of obs F(1, 399) Prob > F R-squared Adj R-squared Root MSE
= = = = = =
401 7.34 0.0070 0.0181 0.0156 .0881
P>|t|
[95% Conf. Interval]
0.007 0.000
.0178292 1.027195
.1120245 1.047603
Discrimination effect appears to be smaller when controlled for income. (iv) . reg lpsoda prpblck logincome Source
SS
df
Model Residual
.196020672 2.68272938
2 398
.098010336 .006740526
Total
2.87875005
400
.007196875
lpsoda
Coef.
prpblck logincome _cons
.1215803 .0765114 -.793768
Std. Err. .0257457 .0165969 .1794337
. di 20*.1215803 2.431606
Price of soda changes by 2.43%. (v)
MS
t 4.72 4.61 -4.42
Number of obs F(2, 398) Prob > F R-squared Adj R-squared Root MSE
P>|t| 0.000 0.000 0.000
= = = = = =
401 14.54 0.0000 0.0681 0.0634 .0821
[95% Conf. Interval] .0709657 .0438829 -1.146524
.1721948 .1091399 -.4410117
. reg lpsoda prpblck lincome prppov Source
SS
df
MS
Model Residual
.250340622 2.62840943
3 397
.083446874 .006620679
Total
2.87875005
400
.007196875
lpsoda
Coef.
prpblck lincome prppov _cons
.0728072 .1369553 .38036 -1.463333
Std. Err. .0306756 .0267554 .1327903 .2937111
t 2.37 5.12 2.86 -4.98
Number of obs F(3, 397) Prob > F R-squared Adj R-squared Root MSE
P>|t| 0.018 0.000 0.004 0.000
= = = = = =
401 12.60 0.0000 0.0870 0.0801 .08137
[95% Conf. Interval] .0125003 .0843552 .1192999 -2.040756
.1331141 .1895553 .6414201 -.8859092
Adding prppov decreased the coefficient of prpblack to .0728072. This appears to correlate the price of soda with the amount of people in poverty and perhaps not the black population.
(vi) . corr lincome prppov (obs=409)
lincome prppov
lincome
prppov
1.0000 -0.8385
1.0000
Yup. An increase in income would cause the population portion with poverty to decrease, which is consistent with its extremely high negative correlation. (vii) I don’t think you can remove a variable just because it’s highly correlated, as this would create bias in your equation. There is also a chance that these variables combined together have an effect on the regression which means you can’t take them out.
6. Problem 3 in chapter 4 . di .321/100 .00321 . di .00321*10 .0321
Increasing sales by one percent causes rdintens will change by .00321 units. Increasing sales by 10 percent will increase it to .0321. This is only .03 of a percent, which is not great when sales increased by 10%.
(ii) . di invttail(29,.05) 1.699127 . di invttail(29,.10) 1.3114336 . di .321/.216 1.4861111
We reject the null hypothesis when B1>0 at the 10% significance. 1.48 F R-squared Adj R-squared Root MSE
P>|t| 0.000 0.000 0.184 0.000 0.000
= = = = = =
353 145.24 0.0000 0.6254 0.6211 .72788
[95% Conf. Interval] .0439089 .0126841 -.0006776 .0217021 10.49829
.091556 .0188348 .0035147 .0501847 11.54353
. reg lsalary years gamesyr bavg hrunsyr rbisyr Source
SS
df
MS
Model Residual
308.989208 183.186327
5 347
61.7978416 .527914487
Total
492.175535
352
1.39822595
lsalary
Coef.
years gamesyr bavg hrunsyr rbisyr _cons
.0688626 .0125521 .0009786 .0144295 .0107657 11.19242
Std. Err. .0121145 .0026468 .0011035 .016057 .007175 .2888229
t 5.68 4.74 0.89 0.90 1.50 38.75
Number of obs F(5, 347) Prob > F R-squared Adj R-squared Root MSE
P>|t| 0.000 0.000 0.376 0.369 0.134 0.000
= = = = = =
353 117.06 0.0000 0.6278 0.6224 .72658
[95% Conf. Interval] .0450355 .0073464 -.0011918 -.0171518 -.0033462 10.62435
.0926898 .0177578 .003149 .0460107 .0248776 11.76048
. di .0359434/.0072408 4.9640095 By dropping rbisyr, hrunsyr is more statistically significant at 4.9640 while the coefficient increased from .0144 to about .0359. (ii) . reg lsalary years gamesyr bavg hrunsyr runsyr fldperc sbasesyr Source
SS
df
MS
Model Residual
314.510478 177.665058
7 345
44.9300682 .514971181
Total
492.175535
352
1.39822595
lsalary
Coef.
years gamesyr bavg hrunsyr runsyr fldperc sbasesyr _cons
.0699848 .0078995 .0005296 .0232106 .0173922 .0010351 -.0064191 10.40827
Std. Err. .0119756 .0026775 .0011038 .0086392 .0050641 .0020046 .0051842 2.003255
t
Number of obs F(7, 345) Prob > F R-squared Adj R-squared Root MSE
P>|t|
5.84 2.95 0.48 2.69 3.43 0.52 -1.24 5.20
0.000 0.003 0.632 0.008 0.001 0.606 0.216 0.000
= = = = = =
353 87.25 0.0000 0.6390 0.6317 .71761
[95% Conf. Interval] .0464305 .0026333 -.0016414 .0062185 .0074318 -.0029077 -.0166157 6.468139
.0935391 .0131657 .0027007 .0402027 .0273525 .0049778 .0037775 14.3484
. di .0173922/.0050641 3.4344109 . di .0010351/.0020046 .51636237 . di -.0064191/.0051842 -1.2382045
Hrunsyr is the only statistically significant variable with a t-statistic of 3.43. Salary will increase by 1.7% with every home run. Fielding and stolen base both have low t-statistics and are therefore not really individually significant. (iii) . reg lsalary bavg fldperc sbasesyr Source
SS
df
MS
Model Residual
94.0540573 398.121478
3 349
31.3513524 1.14074922
Total
492.175535
352
1.39822595
lsalary
Coef.
bavg fldperc sbasesyr _cons
.007144 .0076477 .0323301 3.899057
Std. Err. .0015348 .0029213 .0052818 2.931406
t 4.65 2.62 6.12 1.33
Number of obs F(3, 349) Prob > F R-squared Adj R-squared Root MSE
P>|t| 0.000 0.009 0.000 0.184
= = = = = =
353 27.48 0.0000 0.1911 0.1841 1.0681
[95% Conf. Interval] .0041254 .0019022 .0219418 -1.866388
.0101625 .0133933 .0427183 9.664502
It doesn’t appear any of the t-statistics are insignificant. Batting average, field percentage, stolen bases at Bavg=.4798 fldperc=.5164 sbasesyr=-1.2382 all appear to have an effect on salary, which makes sense....