550Lecture 5 - Lecture notes 5-8 PDF

Title 550Lecture 5 - Lecture notes 5-8
Author Tahmeed Jawad
Course Applied Econometrics
Institution Drexel University
Pages 40
File Size 519.9 KB
File Type PDF
Total Downloads 37
Total Views 137

Summary

Lecture notes...


Description

Econometrics ECON 550 February 15, 2021 Sources of Bias W Chapters 9, 16.1-16.2

Professor Sebastien Bradley Drexel University, LeBow College of Business

Outline for Today Administrative Business: • Exercise 4 – Measurement Error due Sunday @ 11:59 p.m. • Paper rough draft due in class, February 22

1) Midterm Comments 2) External Validity 3) Internal Validity - Endogeneity Bias • Omitted Variables • Functional Form Misspecification • Simultaneity (Simultaneous Causality) • Sample Selection • Measurement Error (Errors in Variables) 4) Exercise 4 – Measurement Error

Assessing Empirical Studies • The reliability and usefulness of an empirical study should

be assessed on the basis of internal and external validity. • External validity refers to the applicability or

generalizability of study results to populations and settings that are different than the one(s) used in the empirical analysis. • Internal validity refers to the validity of the statistical

inferences/conclusions about causal effects for the population and setting studied.

Threats to Externality Validity • External validity is threatened when the population or

setting used in the empirical analysis is unusual and therefore unlike other populations and settings. 1) Differences in Populations E.g. Maternal Leave and Early Reading • If we study the effect of mothers taking an extra week of maternal leave on early reading outcomes in the U.S., this effect is not likely generalizable to European countries where there exists preference/norm for more maternal leave.

2) Differences in Settings E.g. Maternal Leave and Early Reading • ~50 weeks paid leave is common in many European countries.

Threats to Internal Validity • Internal validity is threatened when either 1) The coefficient estimate(s) of the causal effect(s) of

interest is (are) biased or inconsistent, or 2) Tests and confidence intervals yield rejection probabilities that are inconsistent with the desired significance or confidence levels (i.e. standard errors are invalid).

Sources of (Endogeneity) Bias • Violation of the conditional mean independence

assumption (i.e. correlation between your variable of interest and the error term) may arise through several channels: 1) Omitted Variables 2) Functional Form Misspecification 3) Simultaneity (Simultaneous Causality) 4) Sample Selection 5) Measurement Error (Errors-in-Variables) • In each case, biased (non-causal) and inconsistent

coefficient estimates will result.

Omitted Variable Bias • When data for an omitted variable is unavailable and

there are no suitable control variables, alternative solutions to omitted variable bias may exist. 1) Panel data: by evaluating the same cross-sectional units at multiple points in time, panel regression makes it possible to control for unobserved (omitted) timeinvariant effects. 2) Instrumental variables: instrumental variables regression introduces a new variable (i.e. the instrument) which is exogenous to the dependent variable yet correlated with the endogenous regressor. 3) Controlled experiments.

Functional Form Misspecification • This source of bias is effectively a special form of omitted

variable bias (e.g. excluding a polynomial regression term when the population regression model has such a form). • Careful thought about potential non-linearities and plotting data around the fitted regression line can help eliminate functional form misspecification. • Likewise, we might err on the side of caution and include polynomials in key regressors and perform F-tests of joint significance.

Misspecification Tests • The Regression Specification Error Test (RESET)

includes polynomials in OLS fitted values to test whether the base regression model neglects non-linearities in the variables already in the model. E.g. 𝑌  𝛽  𝛽 𝑋  ⋯ 𝛽 𝑋  𝑢 1) Estimate regression model without polynomial terms and compute fitted values 𝑌 2) Estimate 𝑌  𝛽  𝛽 𝑋  ⋯ 𝛽 𝑋  𝛿 𝑌  𝛿 𝑌  𝑣 3) Perform F-test of 𝐻 : 𝛿  0, 𝛿  0 Under the null, the base model is correctly specified (i.e. polynomials of existing regressors are unnecessary).

Misspecification Tests (cont.) Note that the RESET test is silent on whether your model should include additional variables (or exactly which ones). Failure to reject 𝐻 does not indicate that you necessarily have the correct model (just that higher order polynomials involving included regressors are not needed.) Rejection of 𝐻 does not indicate which regressors should be specified in quadratic and/or cubic terms. The RESET test is NOT a general test for omitted variables bias!

Misspecification Tests (cont.) • The Davidson-MacKinnon test allows comparison of non-

nested models. • E.g. Linear vs. linear-log models:

1 𝑌  𝛽  𝛽 𝑋  ⋯ 𝛽 𝑋  𝑢 2 𝑌  𝛽  𝛽 log󰇛𝑋 󰇜  ⋯ 𝛽 log󰇛𝑋 󰇜  𝑢  1) Estimate (2) and compute fitted values 𝑌



2) Estimate 𝑌  𝛽  𝛽 𝑋  ⋯ 𝛽 𝑋  𝜃 𝑌  𝑣 3) Perform a t-test of 𝐻 : 𝜃  0 Under the null, (1) is correctly specified. 4) Repeat steps 1-3 switching the two models.

Simultaneity (Simultaneous Causality) • Simultaneity or simultaneous causality bias arises when

there is reverse causality running from Y to X such that the dependent variable partly determines the value of the regressor(s). • E.g. Traffic Fatalities and Minimum Drinking Age Laws • E.g. GDP and Exports • E.g. Federal Tax Revenue and Tax Rates

Simultaneity (cont.) • Simultaneity induces spurious correlation between Y and

X through the regression error term and will consequently cause the conditional mean independence assumption to fail, thereby yielding biased, inconsistent estimates. • This is often more apparent by framing simultaneity bias as reflecting an omitted variables problem. • E.g. Traffic Fatalities and Minimum Drinking Age Laws • E.g. GDP and Exports • E.g. Federal Tax Revenue and Tax Rates

Simultaneity (cont.) • Formally, simultaneity bias can be illustrated by

considering two simultaneous equations: 𝑌  𝛼  𝛽𝑋  𝑢  𝑋  𝛿  𝛾𝑌  𝑣 • Intuitively, since 𝑢 captures all of the “other factors” which

affect 𝑌 , and 𝑌 in turn affects 𝑋 , then there must be some element of 𝑢 that affects 𝑋 and 𝑐𝑜𝑣󰇛𝑋 , 𝑢 󰇜  0.

Simultaneity (cont.) • Mathematically,

cov 𝑋 , 𝑢  cov 𝛿  𝛾𝑌  𝑣 , 𝑢 ⇒ cov 𝑋 , 𝑢  𝛾cov 𝑌 , 𝑢  cov 𝑣 , 𝑢  𝛾cov 𝑌 , 𝑢  𝛾cov 𝛼  𝛽𝑋  𝑢 , 𝑢 cov 𝑋 , 𝑢  𝛾𝛽cov 𝑋 , 𝑢  𝛾𝜎 ⇒ cov 𝑋 , 𝑢

𝛾𝜎  1  𝛾𝛽

Simultaneity Solutions • Just as for omitted variable bias or errors-in-variables

bias, the best option for addressing simultaneity bias is instrumental variables regression. • If feasible, instrumental variables regression holds the

possibility of being able to separate the simultaneous effects in order to focus exclusively on the causal effect running from X to Y. • In other cases, reverse causality may be especially problematic contemporaneously. A partial solution may involve using pre-determined values of X if panel data are available.

Sample Selection • Sample selection bias arises when data are missing

because of a selection process related to the value of the dependent variable (even after X is accounted for). • In other words, sample selection is a concern when the data are not randomly sampled with respect to Y. • Sample selection is not a concern when data are missing

purely at random, or when selection is on X directly (e.g. you are missing half the data for women but missing none for men), but it may affect external validity.

Sample Selection (cont.) • Some data are inherently selected (e.g. home sale prices,

wages, survey data, online product reviews, etc.). • This is not precisely selection on the value of Y, but may rather be selection on some unobserved determinant of Y. • This is generally only problematic for estimation if this

unobserved determinant of Y is also correlated with a regressor of interest.

Sample Selection – Online Reviews Above the Law, UNC Law Prof Sends a ‘Rather Embarrassing’ Request, Asks Former Students to Help His Online Rating (February 23, 2012): “Rating sites apparently even have the power to bring a well-known UNC Law professor to his electronic knees. It’s not every day that a torts professor sends his former students a “rather embarrassing request” to repair his online reputation. It’s also certainly not every day that the students respond en masse…. On Tuesday, Professor Michael Corrado sent the following email to 2Ls who took his torts class last year, basically pleading for their help. ... ‘I have a rather embarrassing request of you. An undergraduate brought something to my attention that needs to be fixed. It seems that there is a website, something like Rate My Professors, where my rating is so bad that he was uncertain about whether to take my course or not. I was puzzled, because my evaluations are generally not bad. It turns out that there are just a couple of responses on the site, and they are apparently from people who have a real grievance against me for some reason. They are certainly entitled to their opinions, but it isn’t really a fair reflection of my teaching (I hope). What I would like to ask of you is whether, if you are so inclined, you would go onto that site and write your own review of my teaching. I’m not asking you to write a favorable review, just to write an honest review. I think that overall I would get much better ratings if a number of people did this and just gave their honest views.’”

Sample Selection – Charitable Giving • E.g. Charitable Giving Surveys • We might expect less generous individuals to be less

likely to complete the survey. • If we are interested in estimating the effect on giving of something wholly-unrelated to a person’s propensity to complete the survey, selection bias is not a real concern (e.g. paycheck frequency). • If instead we are interested in estimating the effect on giving of something that is related to the selection process, selection bias may be a problem (e.g. volunteering hours).

Sample Selection Solutions • Unfortunately, sample selection bias is largely intractable,

at least using methods considered in this class. • This also implies that we need to exercise caution in

omitting data outliers, as doing so may introduce a form of non-random selection.

Outliers • See Wooldridge Ch. 9.5-9.6 for a discussion of outliers. Bottom Line: • The determination of “outliers” is somewhat subjective and requires careful consideration. • It is always advisable to perform sensitivity analyses around the definition of outliers. • OLS is sensitive to outliers because it fits the regression line by minimizing the sum of squared residuals • Alternatively, one can use Least Absolute Deviation (LAD) also known as quantile regression • LAD is less sensitive to outliers than OLS because it minimizes the sum of the absolute value of the residuals

Measurement Error • Mismeasurement of either the dependent variable or a

regressor can likewise yield biased and inconsistent coefficient estimates. • Mismeasurement of an explanatory variable is referred to

as errors in variables. • Importantly, we must distinguish between classical

measurement error (i.e. mean zero random noise) from non-classical measurement error.

Errors in Variables • Formally, suppose we would like to estimate

𝑌  𝛽  𝛽 𝑋∗  𝑢 but we do not observe 𝑋∗ directly and instead observe only an imprecisely measured 𝑋  𝑋 ∗  𝑒 , where 𝑒 captures the measurement error. • We assume 𝐸 𝑌 𝑋∗ , 𝑋  𝐸 𝑌 𝑋∗ s.t. 𝑋 is uninformative

about 𝑌 once the 𝑋∗ has been accounted for. • We assume 𝐸 𝑒  0

Errors in Variables (cont.) • If we instead estimate

𝑌  𝛽  𝛽 𝑋  𝑣 , Then, using 𝑋  𝑋∗  𝑒 𝑌  𝛽  𝛽 𝑋∗  𝑒  𝑣 , 𝑌  𝛽  𝛽 𝑋∗  󰇛𝛽 𝑒  𝑣 󰇜 𝑌  𝛽  𝛽 𝑋∗  𝑢 𝑢  𝛽 𝑒  𝑣 𝑣  𝑢  𝛽 𝑒 such that: ∑󰇛𝑋 𝑋 󰇜𝑣 ∑󰇛𝑋 𝑋󰇜𝑌  𝛽   𝛽󰆹   ∑ 󰇛𝑋 𝑋 󰇜 ∑󰇛𝑋 𝑋 󰇜

Errors in Variables (cont.) ∑󰇛𝑋 𝑋󰇜𝑣 𝛽󰆹  𝛽   ∑󰇛𝑋 𝑋 󰇜 ⇒ 𝑝𝑙𝑖𝑚 𝛽󰆹  𝛽 

𝑐𝑜𝑣󰇛𝑋 , 𝑣 󰇜 𝑉𝑎𝑟󰇛𝑋 󰇜

Consistency and unbiasedness of 𝛽󰆹 depends on 𝑐𝑜𝑣 𝑋 , 𝑣 :

𝑐𝑜𝑣 𝑋 , 𝑣  𝑐𝑜𝑣 𝑋 , 𝑢  𝛽 𝑒  𝑐𝑜𝑣 𝑋 , 𝑢  𝛽 𝑐𝑜𝑣󰇛𝑋 , 𝑒 󰇜  𝜷𝟏 𝒄𝒐𝒗󰇛𝑿𝒊 , 𝒆𝒊 󰇜

Errors in Variables (cont.) • We consider three possible cases:

1󰇜 cov 𝑋 , 𝑒  0 2󰇜 cov 𝑋∗ , 𝑒  0 3󰇜 cov 𝑋 , 𝑒  0, cov 𝑋∗ , 𝑒  0 (1) If cov 𝑋 , 𝑒  0,  0 and 𝛽󰆹 will be consistent.  𝑉𝑎𝑟 𝛽 𝑒  𝑢  𝛽 𝜎  𝜎  𝜎

 cov 𝑋 , 𝑣  𝑉𝑎𝑟 𝑣

Errors in Variables (cont.) (2) If cov 𝑋∗ , 𝑒  0, then: 𝑐𝑜𝑣 𝑋 , 𝑒  𝑐𝑜𝑣 𝑋∗  𝑒 , 𝑒  𝑐𝑜𝑣 𝑋∗ , 𝑒  𝑉𝑎𝑟󰇛𝑒 󰇜  𝜎 ⇒ 𝑐𝑜𝑣 𝑋 , 𝑣  𝛽 𝜎  0  𝛽󰆹 will be biased and inconsistent.  This is referred to as classical errors in variables.

Classical Measurement Error • Under classical errors in variables (CEV), the

inconsistency of 𝛽󰆹 implies a very specific form of bias: 𝑝𝑙𝑖𝑚 𝛽󰆹  𝛽 

𝑐𝑜𝑣󰇛𝑋 , 𝑣 󰇜 𝑉𝑎𝑟󰇛𝑋 󰇜

𝛽 𝜎  𝛽   𝜎 ∗  𝜎  𝛽

Attenuation bias!

𝜎∗   𝛽  𝜎 ∗  𝜎

Classical Measurement Error Solutions • Under classical measurement error, the direction of bias is

always toward zero. • For some types of analyses, this is not too problematic. • If the nature of the classical measurement error is known,

biased coefficient estimates can be rescaled accordingly. • This requires knowledge about the variance of the unobserved 𝑋 ∗ and the variance of the measurement error. • More realistically (and more generally for all types of measurement error), instrumental variables regression may be applied to recover causal effects.

Errors in Variables (cont.) (3) More generally, if cov 𝑋 , 𝑒  0 and cov 𝑋∗ , 𝑒  0: 𝑝𝑙𝑖𝑚 𝛽󰆹  𝛽  β

𝑐𝑜𝑣󰇛𝑋 , 𝑒 󰇜 𝑉𝑎𝑟󰇛𝑋 󰇜

𝑐𝑜𝑣 𝑋 , 𝑒  𝑐𝑜𝑣 𝑋∗  𝑒 , 𝑒  𝑐𝑜𝑣 𝑋∗ , 𝑒  𝑉𝑎𝑟󰇛𝑒 󰇜  0  𝛽󰆹 will be biased and inconsistent.  The direction of bias will depend on the sign of β and

𝑐𝑜𝑣󰇛𝑋 , 𝑒 󰇜.

E.g. Labor Supply and Wages • Suppose that we wish to estimate the following model of

the effect of wages on labor supply (i.e. hours worked):

𝐻𝑜𝑢𝑟𝑠  𝛽  𝛽 𝑤𝑎𝑔𝑒  𝑢 • When asked about their wages on surveys, however,

workers may tend to underreport wages (by correctly reporting hours but underreporting income).  How would this bias our estimate of the response of

labor supply to the wage?

E.g. Labor Supply and Wage (cont). Suppose individuals with higher wages (income) tend to underreport wages to a greater extent: 𝑋∗  the true (unobserved) wage 𝑋  the reported wage 𝑋  𝑋∗  𝑒 where 𝑒 is the measurement error in the true wage. Systematic underreporting implies 𝐸 𝑒  0. • We expect 𝑐𝑜𝑣 𝑋 , 𝑒  0 if a higher wage is associated with

greater underreporting (𝑒 more negative) of that wage.

• We expect 𝛽  0 if we think the substitution effect will

outweigh the income effect as the wage increases (i.e. higher wage tends to induce additional hours of work)

E.g. Labor Supply and Wage (cont). Recall, 𝑝𝑙𝑖𝑚 𝛽󰆹  𝛽  β

𝑐𝑜𝑣󰇛𝑋 , 𝑒 󰇜 𝑉𝑎𝑟󰇛𝑋 󰇜

So, we expect 𝑝𝑙𝑖𝑚 𝛽󰆹  𝛽 1  𝑠𝑜𝑚𝑒𝑡ℎ𝑖𝑛𝑔 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒

 𝛽

and we will tend to overestimate the true magnitude of the effect of a change in wage on labor supply. (Suppose, for instance, that wage was 20% higher for a group of people and they worked 2 more hours per week but they only reported that their wage was 10% higher. We would attribute the additional 2 hours to only a 10% rise in the wage when in fact it was due to a 20% rise).

Measurement Error in Y • Formally, suppose now that we would like to estimate

𝑌∗  𝛽  𝛽 𝑋  𝑢 but we do not observe 𝑌∗ directly and instead observe only an imprecisely measured 𝑌  𝑌∗  𝑤 , where 𝑤 captures the measurement error.

Measurement Error in Y (cont.) • If we instead estimate:

𝑌  𝛽  𝛽 𝑋  𝑣 𝑣  𝑢  𝑤 • As usual,

𝑐𝑜𝑣󰇛𝑋 , 𝑣 󰇜 𝑉𝑎𝑟󰇛𝑋 󰇜  𝑐𝑜𝑣 𝑋 , 𝑢  𝑤  𝑐𝑜𝑣󰇛𝑋 , 𝑤 󰇜.

⇒ 𝑝𝑙𝑖𝑚 𝛽󰆹  𝛽  where 𝑐𝑜𝑣 𝑋 , 𝑣

• Just as in the case of errors in variables, unbiasedness of

𝛽󰆹 depends on 𝑐𝑜𝑣󰇛𝑋 , 𝑤 󰇜.

Measurement Error in Y (cont.) • Unlike errors in variables, it seems plausible that

𝑐𝑜𝑣 𝑋 , 𝑤  0 in most cases (i.e. measurement error in Y is independent of all explanatory variables).  If 𝑐𝑜𝑣 𝑋 , 𝑤  0,  𝑐𝑜𝑣 𝑋 , 𝑣  0 and 𝛽󰆹 will be consistent.   𝜎  𝑉𝑎𝑟 𝑣  𝑉𝑎𝑟 𝑢  𝑤  𝜎  𝜎

Measurement Error in Y (cont.) • Mismeasurement of the dependent variable that has

mean zero and is uncorrelated with any regressor will not yield biased coefficient estimates, but the estimated standard errors will be larger than otherwise. • (Random measurement error with non-zero mean will merely bias the intercept coefficient estimate). • In contrast, mismeasurement of the dependent variable

that is correlated with the regressors will produce biased and inconsistent estimates.

Exercise – Measurement Error • See instructions for this group exercise in Bb Learn in

Course Documents > Assignments > Exercise 4 – Measurement Error. • Individual responses are due Sunday @ 11:59 p.m. EST

Assignment For next class, please read: 1) W Ch. 13-14 2) AP Ch. 5 (Diff-in-Diff) Exercise 4 – Measurement Error due Sunday @ 11:59 p.m. EST Paper Rough Draft due Monday, beginning of class...


Similar Free PDFs