Homework 2 - Simple linear regression with only one regressor. PDF

Title	Homework 2 - Simple linear regression with only one regressor.
Course	Econometrics(1)
Institution	逢甲大學
Pages	9
File Size	529.5 KB
File Type	PDF
Total Downloads	29
Total Views	146

Preview

CLICK TO PREVIEW PDF

Summary

Simple linear regression with only one regressor....

Description

Homework_2 Report Class: Econometrics (1) Classroom: BB705 Class hours: 13:10 – 16:00 every Tuesday from 2016/09/15 to 2017/01/12 Program: Ph.D. program in Finance Instructor: Prof. Li-Jiun Chen/陳麗君 ([email protected])

Page 1

Page 2

Page 3

Homework 2.1. / (i) To demonstrate the relationship between hours of job training per worker ( training) and number of nondefective items produced per worker hour (output ), we can rewrite this relationship in regression model form as output =f ( training )+ u where, f’s function form could be linear or non-linear. In this model, all other factors rather than training that might affect the output could be individuals’ characteristics, family income, parents’ education, individuals’ concentration on the days of being trained. These factors are transparently in the error term, u. Holding all factors in u fixed, the slope of the regression model, is the measurement of varying in output in changes in training. In reality, it seems likely that output will increase with an appropriate amount of training. However, training could negatively impact on output if training hours would have increased too much. For example, at the equilibrium output, 120 of nondefective items per working hour could be a decline if workers have 4 hours training per one day since they have a little remaining time for working and other activities. With justifications for job training programs to improve worker productivity efficiently, to answer this policy question, policy makers should consider the sign of the slope at a certain number of training hours. If the output is positively correlated with training, he/she can increase the training to push the productivity up. Otherwise, he/she should not increase the training or even reduce the amount of training to an appropriate level. One more thing, since the collected data is just conducted on manufacturing firms in Ohio rather than from individual workers. Therefore, this is not a random sample and might not satisfy the conditions of E ( u) =E ( u∨training ) =0 , and we could not inference for the population, e.g. the policy maker from the other states could not use the outcome of the regression model to increase or reduce the traning for productivity improvement. (ii) No, a firm’s decision to train its workers could be influenced by worker’s characteristics such as innate ability, motivation, personal perspective, optimistic or pessimistic, personal education, married status, health status, financial status, personal and family income, individuals’ concentration on the days of being trained. The worker’s unmeasurable characteristics (or difficult to measure) might be individuals’ concentration on the days of being trained, innate ability, motivation, personal perspective, optimistic or pessimistic. The worker’s measurable characteristics might be personal education, married status, health status, financial status, personal income. (iii) A factor other than worker characteristics that can affect worker productivity might be family income. (iv) If I find a positive correlation between output and training, I would not convincingly establish that job training makes workers more productive (output) since the sample is not random and we could not make inference for all cases. If my firm or my client’s firm is located in Ohio, I could say it’s

Page 4

OK that training makes workers more productive. Otherwise, I will not use the outcome of the regression model for decision making for firms located outside of Ohio.

Homework 2.2. / (i) Given a population

GPA =β 0 + β 1 study +u ; E ( u )=0

(*)

The most factors are usually in u such as gender, SAT or ACT archived scores, high school percentile from the top, age, parents’ education, the number of siblings, health status, SAT, ACT, parents’ education is likely positively correlated with the study. Otherwise, the number of siblings, health status, and age are probably negatively correlated with the study. (ii) The given equation is causally interpreted if it satisfies zero mean, conditional mean independence assumptions, and holding factors in the error term fixed (inference basing on ceteris paribus). In (*), if the equation is interpreted causally (e.g. E ( u )=E ( u∨x )=0 ), the sign of β 1 should be positive. if other factors holding fixed (ceteris paribus), GPA will increase when study increases. (iii) In (*), the intercept β 0 is interpretated as the expected GPA when setting study to 0, holding other factors fixed. This means if no hours spent for per week studying, the GPA, on average, is expected to be β 0 . In terms of statistics, β 0 is a useful interpretation since we have

E ( u )=0 . However, If

β 0 is negative and/or farway from origin, it implies that this linear

regression model is biased; in others words, it is a poor prediction of GPA, since there is rarely someone could have high GPA without any hours of studying per week.

Homework 2.3. / Given a linear regression model (*) could be rewritten as Substitute

y=β 0 + β 1 x +u (*), and E(u) =u0 ≠ 0.

y =(β 0+ u0)+β 1 x +(u−u 0)

~ u for (u−u 0) β 0 for ( β 0+ u0 ), also, ~

Then we have

~ u (**) y= β 0 + β 1 x +~

Where:

E (~ u )=E ( u−u0 ) =E ( u) −E ( u 0 ) =u 0−u0=0

Therefore, we can say (*) can always be rewritten as (**) having the same slope as (*), but with new intercept:

~ β 0= β 0 +u0

and error term:

~ u=u−u 0

Homework 2.4. / (i) Interpret the intercept in this equation, and comment on its sign and magnitude

^ β 1 inc=− 124.84+0.853∗inc ;n=100, R2=0.692 cons= ^ β0 + ^

Page 5

The intercept -124.84 is the predicted CONSUME when INCOME is set to zero. Since the amount of annual consumption is measured in dollars of spending in goods and services of a family for one year, and must be greater 0. There is no one doesn’t consume anything within one year, so the intercept, itself, not meaningful. In other words, this linear model is a poor prediction for the sample, especially when income come to very slow. However, the vertical intercept -124.84 is not far away from 0 regarding annual period. (ii) What is the predicted consumption when family income is $30,000

^ E ( cons|inc=30,000 ¿=− 124.84+0.853∗30,000=25,465.16 dollars (iii) With inc on the x-axis, draw a graph of the estimated MPC and APC MPC is simply the slope of the estimated equation MPC= ^ β 1=0.8532 :. So, it is a constant line that parallels with the x-axis (income). APC is obtained by dividing the estimated equation by income variable:

APC = ^ β 0 /inc+ ^ β 1=−124.84 /inc+0.853 Using the dataset from Excel file saving.xls including observations of annual consumption and income of 100 families, then import these observations to SAS for drawing. SAS code: libname consNinc 'D:\02_Economic\00_PhD\10_FCU\00_Classes\00_Econometrics\04_Homeworks&Termpapers\Homework2.4.S aving.xls'; data homewk2_4; MPC=0.853; set consNinc.'SAVING$'n; libname consNinc clear; /* the column headers in the first row are sav, inc, size, educ, age, black, and cons respectively*/ APC = -124.84/inc + 0.853; proc sort data=homewk2_4 out=homewk2_4s; by inc APC; proc template; define statgraph layoutoverlay; begingraph; entrytitle "The estimated MPC and APC"; layout overlay / cycleattrs=true xaxisopts = (label="Income (dollars)") yaxisopts = (label="Consume (dollars)"); scatterplot x=inc y=APC / legendlabel="Average Propensity to Consume (APC)"; seriesplot x=inc y=MPC / curvelabel="MPC"; pbsplineplot x=inc y=APC / name="apc_fitline" alpha=0.05 curvelabel="APC" legendlabel="APC fitted line"; discretelegend "apc_fitline"; endlayout; endgraph; end; proc sgrender data=homewk2_4s template=layoutoverlay; run;

Page 6

Homework 2.5. / (i) Estimate the relationship between GPA and ACT using OLS: Excel calculation: ID

Yi

Xi

Yi-Ybar

Xi-Xbar

1

2.8

21

-0.4125

-4.875

2

3.4

24

0.1875

3

3

26

4

3.5

5

(Xi-Xbar)^2

(Xi-Xbar)*(Yi-Ybar)

(Yi-Ybar)^2

23.766

2.011

0.170

-1.875

3.516

-0.352

0.035

-0.2125

0.125

0.016

-0.027

0.045

27

0.2875

1.125

1.266

0.323

0.083

3.6

29

0.3875

3.125

9.766

1.211

0.150

6

3

25

-0.2125

-0.875

0.766

0.186

0.045

7

2.7

25

-0.5125

-0.875

0.766

0.448

0.263

8

3.7

30

0.4875

4.125

17.016

2.011

0.238

sum

25.7

207

56.875

5.813

1.029

Where: Y ~ GPA

Ybar =

3.213

(sample average of GPA)

X ~ ACT

Xbar =

25.875

(sample average of ACT)

TSS =

1.029

sum( (Yi-Ybar)^2 )

So we have: beta1 = sum((Xi-Xbar)*(Yi-Ybar)) / sum((Xi-Xbar)^2) = (5.813/56.875) =

Yhat

Yhat - Ybar

ID

Yi

Xi

1

2.8

21

2.71428571

-0.4982143

0.248

0.086

0.007

2

3.4

24

3.02087912

-0.1916209

0.037

0.379

0.144

3

3

26

3.22527473

0.01277473

0.000

-0.225

0.051

Page 7

(Yhat - Ybar)^2

0.102

uihat=(Yi - Yhat)

(uihat)^2

4

3.5

27

3.32747253

0.11497253

0.013

0.173

0.030

5

3.6

29

3.53186813

0.31936813

0.102

0.068

0.005

6

3

25

3.12307692

-0.0894231

0.008

-0.123

0.015

7

2.7

25

3.12307692

-0.0894231

0.008

-0.423

0.179

8

3.7

30

3.63406593

0.42156593

0.178

0.066

0.004

sum

25.7

207

0.594

sum (uihat ) = 0.00

0.435

Then: beta0 = Ybar – beta1*Xbar = 3.213 – 0.102*25.815 =

0.568

ESS = sum ((Yhat - Ybar)^2) =

0.594

RSS = sum ((uihat)^2) =

0.435

R^2 = ESS / TSS = 1 – RSS/TSS =

0.577

SAS Code: data homewk2_5; input Student GPA ACT; cards; 1 2.8 21 2 3.4 24 3 3.0 26 4 3.5 27 5 3.6 29 6 3.0 25 7 2.7 25 8 3.7 30 ; proc reg data=homewk2_5; model GPA=ACT; run;

SAS output: The SAS System The REG Procedure Model: MODEL1 Dependent Variable: GPA

10:15 Sunday, October 9, 2016 1

Number of Observations Read Number of Observations Used

8 8

Analysis of Variance Source

DF

Model Error Corrected Total

1 6

Sum of Squares

7

Mean Square

F Value

Pr > F

8.20

0.0287

0.59402 0.59402 0.43473 0.07245 1.02875

Root MSE 0.26917 R-Square 0.5774 Dependent Mean 3.21250 Adj R-Sq 0.5070 Coeff Var 8.37893 Parameter Estimates Variable

Parameter Standard DF Estimate Error

Intercept ACT

1 1

0.56813 0.10220

0.92842 0.03569

t Value

Pr > |t|

0.61 2.86

0.5630 0.0287

Eventually, we have:

^ GPA= ^ β0+ ^ β 1 ACT =0.56813 + 0.10220∗ACT

Page 8

The direction of the relationship: ^ β 1=0.10220 , so GPA and ACT are positively correlated in the sample. Does the intercept have a useful interpretation here? First, the intercept 0.56813 is the predicted college GPA if ACT is set as zero. Since no one who attends college has a zero on the achievement test; therefore, the intercept is not meaningful. How much higher is the GPA predicted to be if the ACT score is increased by five points Holding other factors fixed, if ACT is increased by 5 points, the predicted GPA will increase 0.10220*5 = 0.511 (ii) see the results at (i) (iii) The predicted value of GPA when ACT = 20 is:

^ E ( GPA| ACT =20 ¿=0.56813 + 1.10220∗20=2.612 (iv) How much of the variation in GPA for these eight students is explained by ACT = R^2 = R-Square 0.5774. Explain: By using this linear regression model, we can explain 57.74% of total variation of GPA in changes in ACT. The remaining percentage (42,26%) of total variation of GPA could be explained by the whole factors in the error term (unsystematic part).

Page 9...