Treatment effects 2 - Lecture notes 7 PDF

Title Treatment effects 2 - Lecture notes 7
Course Econometerics
Institution University of Reading
Pages 10
File Size 224.1 KB
File Type PDF
Total Downloads 90
Total Views 119

Summary

Theory Lecture....


Description

Topic 7: Treatment Effects Estimation Part 2

Selection on Unobservables Estimators We now move onto examining estimators which we can use in the case the CIA fails such that once we adjust for differences in characteristics between the treated and untreated we still have some selection on unobservables i.e. there are still factors (confounders) that influence both the treatment and potential outcomes, and hence treatment is still endogenous. Selection on observables techniques may be able to reduce bias in estimates but they cannot solve the endogeneity problem. A common use of selection on unobservables is to estimate the causal effect of some kind of policy change. 1. Endogenous Treatment Regression A treatment effects regression model (etregress command in Stata) will run a maximum likelihood estimator that estimates the effect of an endogenous treatment variable (a binary variable). If we return to the university degree example, then we expect the decision to obtain a university degree is related to an individual’s potential wage rate in the labour market. We specify the following model: '

y i= x i β +δd i + ui

(1)

di is a binary variable for whether they received the treatment or not (whether they obtained a university degree or not). δ is the coefficient we want to measure, the effect of the treatment, but if those who attend university are likely to have higher wages (as a result of say higher unobserved ability) in the first place and we have a self-selection problem. Note previously, when we examined Heckman selection models, selection determined whether we observed the dependent variable but in this current case selection determines whether they received the treatment, which is an explanatory variable in the model we want to estimate. Therefore, we could model the process of treatment (obtaining a university degree) using the following latent variable specification approach (as we have used previously in the module): '

d i∗¿ zi γ+v i (2)

z i is a vector of explanatory variables that determine if they receive the treatment v i the error term for the selection equation (di=1), γ a vector of coefficients and d i∗¿0 , and zero otherwise. (selection being into the treatment group). We observe d =1 if where

i

We assume that the two error terms ui and vi are correlated and follow a bivariate normal

1

distribution in the same vein as the Heckman model with a mean of zero and a covariance

matrix of

[ ] σε ρ

ρ 1

σ

ε is the standard deviation of ε ρ is the correlation between the two error terms ui and vi , and we normalise the standard deviation/variance of v to 1. We can derive the expected value of the dependent variable (wage ) for those who were treated (obtained a university degree): '

'

E( y i|d i =1, x i , z i )=x i β+δ+ ρσ ε λ i (−zi γ )

(3)

Remember λ(.) is the inverse mills ratio or the hazard function and as a reminder from the '

limited dependent variable topic (in this case α=z i γ ) :

λ( α )=

φ(α ) 1−Φ( α )

For those who are not treated the expected value of the dependent variable is: '

[

E( y i|d i =0 , x i , z i )=xi β +δ + ρσ ε λ

'

−φ( z i γ ) '

1−Φ( z i γ )

]

(4) So the difference in the expected value of the outcome variable between the treated and untreated is (the ATE): ¿ ¿

φi ( zi' γ )

¿ Φ i ( z i' γ ) ( 1−Φ i ( z i' γ ) ) ¿ E ( y i|d i =1 , xi , z i )−E ( y i|di =0 , x i , z i )= ρσ ε λ ¿ ¿

(5) If we omitted λ(.) , unless ρ=0, then since all the terms are assumed to be positive the OLS estimate of δ would be overestimated. Stata can estimate this type of treatment effect by a two-step procedure or a maximum likelihood estimator, in a similar way to the Heckman selection model – see the Stata help file for more on how Stata estimates such a model. Note that such a model assumes the coefficients for the x variables estimated in the model are the same for both those who are treated and those who are not – see Greene(2012, p931-932) for a derivation when this is not the case and a derivation of the ATET. The estimate that Stata reports will be both the ATE and ATET but we can allow the effects to vary by including interaction terms in Stata’s eregress command and making using of the margins. Note that the endogenous treatment effects regression model can be extended to the case where the treatment variable is multinomial instead of binary. There is a user written command called mtreatreg which can be installed. See Deb and Trivedi (2006) for more on this technique.

2. Instrumental Variable Techniques 2

The idea (as we have seen before with IV) is we need to find an instrument ( z) that predicts treatment assignment net of the covariates, but is not related to any unobservables that may drive treatment assignment and the outcome variable. For more information on instrumental variable techniques to estimate treatment effects see Wooldridge (2010, ch. 21.4), Cameron and Trivedi(2005, ch25.7), Angrist and Pischke(2008, ch4.4 and ch.45) or Imbens and Wooldridge (2009). It is useful to express the observed outcome y as follows: The conditional means are denoted as:

μ0 =E( y 0 ) μ1 =E( y 1 ) And we define the unobservables that influence the outcome in absence of the treatment (v 0) and with the treatment (v1) as: v1 = y1 - μ1 and v0 = y0 - μ0 y can, therefore, be expressed as:

y=μ0 +( μ1 −μ0 )d + v0 + d( v1-v0 )

(6)

We cannot assume that v1 and v0 are mean independent of d once we have conditioned on x as we do for the selection on observable methods i.e. there are still factors determining both the treatment assignment process and the outcome that we cannot condition on. However, if we have an available instrument z that fulfils the conditions (i.e. effects d but is unrelated to any of the unobservables and is independent of y0 and x ) then if we are able to assume that v1 and v0 are equal then we can make use of the standard IV techniques we have already covered. Often eligibility is often used as an instrument for participation but there is no guarantee that eligibility will fulfil the conditions required for an instrument if it affects an individual’s behaviour that influences y. A slight variant on this is to run a first stage probit estimation of d on the x covariates and the instrument(s) z, predict the probabilities of being assigned to the treatment group and then run an IV regression for y using the predicted probabilities as an instrument for d rather than z. Wooldridge (2010) suggest this method provides more efficient estimation than using standard IV techniques. If we cannot assume that v 1 and v0 are equal this makes estimation harder, see Wooldridge (2010, p942) on how to estimate the ATE when v1 and v0 are not equal. Essentially we run a first stage probit but in addition to using the predicted probabilities as instruments for d we also use the probabilities interacted with the demeaned covariates. Alternatively we could run standard IV and include interaction between z and the covariates as additional instruments. Wooldridge (2010, ch. 21.4) discusses how IV techniques can work very well if we have a good instrument for estimating the ATE, even if we do not specify the first stage probit well.

3

Wooldridge (2010, 21.4.2) discusses correction and control function approaches to IV estimation for treatment effects. We have already seen an endogenous treatment regression in section 1. A popular instrumental variable technique for estimating treatment effects is the local average treatment effect (LATE). Again we need to find an instrumental variable z; to illustrate the LATE estimator we assume z is a binary variable for simplicity. For example, z could represent whether an individual is eligible for participation in the treatment group (say a job training programme) and d measures their actual treatment status (whether they participated in the training programme). The idea is that if eligibility is random eligibility should be related to the treatment status but not the outcome, but this might not always be the case, as mentioned, if being eligible or not induces behaviour that impacts on the outcome variable. We define two counterfactual treatments d1 if z=1 and d0 if z=0. We can, therefore, write their treatment status as:

d =( 1−z ) d 0 + zd 1=d 0 + z ( d 1 −d 0 )

(7)

if we plug equation 7 into equation 1 from the previous treatment effects notes then we get:

y = y o +d 0 ( y 1− y 0 )+ z (d 1 −d 0 )( y 1 − y 0 )

(8)

It is assumed z is independent of y o y1 d0 and d1, and any expectations involving these are independent of z, so:

E ( y|z=1)= E ( y o )+E[ d 0 ( y 1− y 0 )]+ E [(d 1− d 0)( y 1 − y 0 )]

E( y|z=0 )= E( y o )+ E [ d0 ( y 1 − y 0 )] (9) We assume d1>d0. All this means is if someone not eligible (i.e. z=0) for the treatment group, who would have participated, would have still participated if they were eligible (i.e. z=1), which seems a reasonable assumption! On the basis of this assumption and some algebraic manipulation we end up (see Wooldridge (2010, p952) with:

E ( y|z=1)− E ( y|z=0)= E ( y 1 − y 0|d 1 −d 0 =1) P(d 1 −d 0 =1) (10) Equation 10 implies the LATE estimator is:

τ late =E( y 1 − y 0|d 1 −d 0 =1 ) (11) The LATE estimator is measuring the average treatment effect for those who would be induced to participate by z changing from 0 to 1 and we estimate it over the part of the 4

population where d1-do=1. We can obtain estimates for using a random sample and if we can estimate following: E ( y 1− y 0|d 1−d 0=1 ) τ late = P ( d=1|z=1)− P ( d=1|z =0 )

E( y|z=1)

P(d 1 −d 0 =1)

and

E( y|z=0 )

then we can estimate the

(12)

And can be consistently estimated through:

τ late =

¯y 1− ¯y 0 ¯d − ¯d 0 1

(13) We use individuals where z=1 to estimate

y¯1 and

¯d 1

, and individuals where z=0 to

¯d

estimate ¯0 and 0 . The LATE estimator can be viewed in terms of the mean difference in the treated and non-treated outcomes divided by the change in the proportion in the treatment group due to a change in z. We only measure the treatment effect of the so called “compliers” i.e. those who were induced to participate in the treatment due to a change in z. Therefore the LATE estimator relies on the values of z and would change if the instrument used changed. And of course, as the local label suggests, those who are induced to enter the treatment may not be representative of the full population of the treatment group. These notes have only provided a brief intuition to the LATE estimator so see the references for more about this estimator.

y

There is also another effect of interest called the intention to treat (ITT) effect worth mentioning. Individuals may be assigned to a treatment group but not take up the treatment so the ITT effect estimates the impact of eligibility. The ATET can then be estimated by divided the ITT effect by the proportion of people assigned to the treatment group who participate in the treatment. Therefore to summarise there are potentially four causal measures we could estimate: ATE – the effect of a treatment on a randomly selected person ATET – the effect of a treatment for those who actually received the treatment And the two we have just introduced LATE – the effect of treatment for those who are near the threshold of being treated ITT – the effect of being eligible for treatment

3. Panel Data Methods Difference-in-Differences (DID)

5

If we have at least two time periods (pre and post treatment) and two groups (a treated and a control group) then it is possible to create a differences-in-differences (DID) estimator. DID will eliminate any unobserved factors which do not change over time. Again we assume a binary variable d which measures whether an individual receives treatment in period t so dit=1 if individual i receives treatment in period t and d it=0 otherwise. If we assume there is an individual specific fixed effect (unobserved individual effect which will capture individual

T

t (this specific factors that do not change over time) αi and a time specific fixed effect will be specified as a series of time dummy variables or one dummy in the case of two time periods) which does not vary across individual, then we can specify the following model:

y it =φd it +T t +a i +uit

(14)

To begin with we assume there are no other explanatory variables beyond the time variable(s) and the treatment binary variable. In order to estimate the treatment effect difference equation 14 to eliminate the unobserved individual effects ai:

φ we could first

Δyit = φΔd it +(T t −T t−1 )+ Δu it

(15)

and run a pooled OLS with a full set of time dummies. Alternatively we could estimate a within group estimator to remove ai. If we look at the two period case (a common case as often subjects are observed pre and post treatment) and secondly assume treatment occurs in time period 2 so di1=0 for all and di2=1 for the treated group and 0 otherwise, then dropping the t subscript we want to estimate:

Δyi =φd i +T +ui

(16)

with di a binary treatment variable of whether the individual received treatment in period 2 or not. By running equation 14 we can estimate the differences-in-differences estimator of tr

φ^ =Δ ¯y − Δ ¯y

φ^

:

nt

That is the difference between the sample average of

Δyi for di=1 and the sample average

Δyi for d =0. This estimate is criticised as we are no longer able to see if the change was as i a result of the treatment or something else that changed between the groups – an assumption of the DID estimator is the characteristics of the treated and untreated groups remain stable. Equation14 will estimate the ATE but we are more likely to be interested in the ATET:

τ ATET =E [( y i− y 0 )|d i =1] The problem is finding a counterfactual i.e. ATET =E [ y 0|di =1 ] . If assignment in the treatment group was completely random then the ATE would equal the ATET. We could in fact construct the DID estimator slightly differently, assuming two time periods in which treatment was received in the second period and two groups – the treated and the control group. We could estimate the following model using OLS: 6

y it =α +δ 1 T t +δ 2 d i + δ 3 d i . T t +uit (17) Let A refer to the control group and B to the treatment group. y it is the outcome variable (observed in all periods) Tt is a dummy variable equal to 1 for the second period and zero for the first period; di is a dummy variable equal to 1 if the individual is in the treatment group and zero for the control group and diTt is an interaction term for the second period for those in the treatment group. Therefore the coefficient δ2 measures any differences between the treated and control group, the coefficient δ 1 any differences occurring between period 1 and 2 that affect both groups and δ 3 is the difference-in-differences and is equal to the following:

^δ =( ¯y − ¯y )−( ¯y − y¯ ) 3 B, 2 B ,1 A,2 A, 1 (18) δ3 in equation 18 is equal to the difference in the average change of the two groups and hence the effect is an ATET. If we simply took the difference between period 1 and period 2 for the treated group we miss any changes in y not related to the treatment and if we took just the difference between the treated and control groups in the second period we miss systematic differences between the two groups not connected to the treatment. Therefore the differencein-differences approach allows for both time and group-specific effects but requires that being in the treatment group is not related to other time varying factors that affect the outcome variable and are in the error term (i.e. selection into groups is random). Even if selection is random there may still be systematic differences between the groups and commonly other explanatory variables are included that affect the outcome variable (but not ' which group they are in) i.e. α is replaced by x i β . The interpretation of δ 3 will remain the same with additional variables but not the mathematical representation. Note that the control group should not be indirectly affected by the treated group receiving the treatment. For a good overview of some of the problems of estimating DID see Bertrand and Mullainathan (2004), especially in relation the standard errors – available on the on-line reading list.

4. Regression Discontinuity Design In the case where there is no common support i.e. the treatment and control group do not overlap it is possible to estimate treatment effects using a regression discontinuity design (RDD); in this type of setting participation in a treatment group is usually determined by a threshold. A common application is in the case where financial aid for students in college/university are determined by their SAT scores or Grade Point Average (GPA) which are common measures of education performance in America. The idea is to compare those just above and below the threshold that determines treatment e.g. the threshold SAT score that determines whether a student can obtain financial aid, since they are likely to be very similar in terms of characteristics. Greene (2012, 937-938), Wooldridge (2010, p954-959) and Cameron and Trivedi (2005, 879-883) provide a good introduction to regression discontinuity 7

designs, whilst Imbens and Lemieux (2008), Imbens and Wooldridge (2009) and Lee and Lemieux (2010) provide good summaries of the regression discontinuity design and how it has been used. There are two main types of design. The sharp regression discontinuity design occurs where assignment to the treatment group follows a deterministic rule so there is a clear cut off point. The fuzzy design occurs when the probability of being in the treatment group is discontinuous at a known and specified point so there is not a clear cut off point but the probability of being in the treatment group jumps at the cut-off point. The Sharp Regression Discontinuity Design Again we assume yi1 refers to the outcome variable if they received the treatment (d i=1) and yi0 if they did not (di=0), and again the problem is we do not know the value of yi1 or yi0 if they did not or did receive the treatment, respectively. However, in this case we have some covariate x that determines whether they are in the treatment group or not. We assume there is some cut-off point (c) for x that determines whether an individual is in the treatment group i.e.: di=1 if xi ≥ c and hence we observe the following:

y i= yio (1− di )+ y i1 (d i )

(19)

We also need the CIA to hold i.e. once we control for x, d and y are independent (again the following refers to conditional mean independence):

E( y 0|x , d )=E ( y 0|x )

and E( y 1|x , d)=E( y 1|x )

Again it is worth defining the two counterfactual conditional means:

μ0 (x )= E( y 0|x , ) μ1 ( x )=E( y 1| x ) The problem is we have no overlap as values of x above and below c will determine which group the individual is in, therefore we cannot make use of the various selection on observable methods we have seen. The idea is to then focus on the margin i.e. at x=c and estimate the following:

τ c =E( y 1 − y 0|x=c )=μ1 ( c )− μ 0 (c ) We only need to assume that c is continuous and whilst we could estimate

(20)

E( y|x,d=0)

for all observations where xc it makes more s...


Similar Free PDFs