# heckman two-stage model

Heckman Selection Models
Graduate Methods Master Class March 4th, 2005 34 Kirkland Street, Room 22 Dan Hopkins (with many thanks to MIT’s Adam Berinsky for generously sharing slides from his 2003 ICPSR Course, “Advanced MLE: Methods of Analyzing Censored, Sample Selected, and Truncated Data”)

Introduction
? The majority of models in political science make some form of Imbens’ (2004) exogeneity/ unconfoundedness assumption: systematic differences in treated and control units with the same values for the covariates are attributed to the treatment ? But… Achen (1986) identifies two common and thorny challenges to the unconfoundedness assumption: 1) non-random assignment to treatment and 2) sample selection/ censoring

Introduction (continued)
? The Heckman models I will present are designed to deal with sample selection, but the same approach can be used to deal with non-random assignment to treatment as well (e.g. von Stein forthcoming) ? Selection bias can be thought of as a form of omitted variable bias (Heckman 1979)

Typology (from Berinsky/Breene)
Sample Censored Y Variable y is known exactly only if some criterion defined. in terms of y is met. X Variable x variables are observed for the entire sample, regardless of whether y is observed exactly Example Determinants of income; income is measured exactly only if it above the poverty line. All other incomes are reported at the poverty line Survey data with item or unit nonresponse

Sample Selected

y is observed only if a criteria defined. in terms of some other random variable (Z) is met.

x and w (the determinants of whether Z =1) are observed for the entire sample, regardless of whether y is observed or not x variables are observed only if y is observed.

Truncated

y is known only if some criterion defined in terms of y is met.

Donations to political campaigns.

Sample Selection: Intuition

SELECTION EQUATION ? zi* = latent variable, DV of selection equation; think of this as the propensity to be included in the sample ? wi’ = vector of covariates for unit i for selection equation ? = vector of coefficients for selection equation ? i = random disturbance for unit i for selection equation ? zi* = wi’ + i OUTCOME EQUATION ? yi= DV of outcome equation ? xi’ = vector of covariates for unit i for outcome equation ? = vector of coefficients for outcome equation ? ui = random disturbance for unit i for outcome equation ? yi = xi’ + ui

Can’t we just include the selection factors in the outcome equation?
? If there are no unmeasured variables that predict selection into the sample, we can (i.e. deterministic sample selection) ? If selection into the sample is random, we can (logic behind population inferences from telephone surveys)

Why can’t we just use explanatory variables in the outcome equation?
? What about if we cannot predict selection perfectly? ? 12 = Cov(ui, i) ? s = the unexplained variance in the assignment variable z when regressed on exogenous variables in the outcome equation X ? Inconsistency in treatment effect = (from Achen 1986) 12 / s ? Adding variables to the outcome equation might decrease s without necessarily decreasing 12 ? Hence using explanatory variables in the outcome equation could exacerbate the problem

Achen’s Warning
“With quasi-experimental data derived from nonrandomized assignments, controlling for additional variables in a regression may worsen the estimate of the treatment effect, even when the additional variables improve the specification.” —Achen, 1986, page 27

Heckman Model (from Berinsky’s slides)
? Relationship of interest is a simple linear model

yi

xi

u

Outcome Equation

? Assume that Y is observed iff a second, unobserved latent variable exceeds a particular threshold

z
? Looks like a probit

* i

wi 1 if z

ei ;
* i

zi

0;

0 otherwise
Selection Equation

Pr z i

1

wi

Heckman Models: Likelihood Function
? Further assume Y, Z have bivariate normal distribution with correlation coefficient ? So the MLE (again, from Berinsky) is:
Ln L
z 0

Ln 1 -

wi

z 1

Ln 2

1 2 u yi xi u 2
z 12

1 2 u

yi

xi

2

wi
z 1

Ln 1

Downsides of the Heckman Selection Model
? Need an exclusion restriction/instrument or model is identified solely on distributional assumptions (Sartori 2003; Liao 1995) ? Very sensitive to assumption of bivariate normality (Winship and Mare 1992) ? parameter very sensitive in some common applications (Brandt and Schneider 2004; Sartori 2003) ? For instance, Sartori (2003) replicates Lemke and Reed, finds the 95% confidence interval is from = -.999999 to +0.99255

Extensions
? Can be modified so that dependent variable in outcome equation is binary (Heckman probit, the below is drawn from Berinsky)
1

Ln L

,

2

,

y 2 1, y1 1

ln ln

2

1

x i1 ,
1

2

xi2 ,
2

y 2 1, y1 0

2

x i1 ,

xi2 ,

y2

0

ln

2

xi2

W here: Y i1 ~ f bern (y 1i | 1i), 1i defined by the underlying probability term Yi 1 x i 1 u 1i is the outcom e process, Y i2 ~ f bern (y 2i | 2i), 2i defined by the underlying probability term Yi 2 x i 2 u i 2 , is the selection process y 1i =0 and y 2i =1 is an untruncated failure, y 1i =1 and y 2i =1 is an untruncated success, y 2i =0 is a truncated observation.
2 1

x1 ,

2

x2 ,

is the cum ulative bivariate norm al function defined by

1

x1 ,

x 2 and ; and u 1i and u 2i are bivariate norm ally distributed iid, w ith
2

u 1,u 2

.

? I generated a dataset that fits the description above. ? Because I generated the dataset, I know the truth, even if I will hide the truncated information from my estimators ? The correlation in the full sample between grades and success is 0.47. In the truncated sample, it is just 0.17.

R Code for Example 1
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? setwd( "C:/Documents and Settings/labguest/Desktop") ###EXAMPLE n <- 1000 ##VARIABLES grades motivation sigma <- diag(2) sigma[1,1] <-.75 sigma[sigma==0] <- .25 library(MASS) data <- mvrnorm(n, c(2,0),sigma) success <- 2*data[,1] + 8*data[,2] + rnorm(n,1,.25) randomad <- rbinom(100,30,.4) admitted <- 1*((success + randomad) > (mean(success) + mean(randomad))) data <- cbind(success,admitted,data[,1],data[,2],randomad) colnames(data) <- c("success","admitted","grades","motivation","randomad") df1 <- data.frame(data) df1\$success2 <- 1*(df1\$success > quantile(df1\$success,.6)) round(cor(df1),digits=3)

R Code for Example 2
# success admit grades motivation instrument success2 #success 1.000 0.779 0.468 0.982 0.029 0.791 #admitted 0.779 1.000 0.356 0.766 0.233 0.759 #grades 0.468 0.356 1.000 0.295 0.053 0.356 #motivation 0.982 0.766 0.295 1.000 0.021 0.780 #randomad 0.029 0.233 0.053 0.021 1.000 0.016 #success2 0.791 0.759 0.356 0.780 0.016 1.000 df2 <- df1[df1\$admitted==1,] round(cor(df2\$grades,df2\$success2),digits=3) #[1] 0.173 df1\$success3 <- NA df1\$success3[df1\$admitted==1] <- df1\$success[df1\$admitted==1] write.table(df1,file="hecktest.dat",sep=",",na=".",row.names=F)

. heckman success3 grades, sel(grades randomad) Iteration 0: log likelihood = -2130.1572 Iteration 32: log likelihood = -2035.3218

Stata Results: An Admissions Committee, Heckman Model
1000 = 498 Wald chi2(1) = 138.79 Prob > chi2 =

Heckman selection model Number of obs = (regression model with sample selection) Censored obs Uncensored obs = 502

Log likelihood = -2035.322

0.0000

-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------f7 | grades | 3.592449 .3049346 11.78 0.000 2.994788 4.19011 _cons | -.7373959 .5827337 -1.27 0.206 -1.879533 .4047411 -------------+---------------------------------------------------------------select | grades | .475208 .0415684 11.43 0.000 .3937355 .5566806 randomad | .1322797 .0044137 29.97 0.000 .123629 .1409304 _cons | -2.214016 .090714 -24.41 0.000 -2.391812 -2.03622 -------------+---------------------------------------------------------------/athrho | 15.60179 40.50948 0.39 0.700 -63.79532 94.99891 /lnsigma | 2.022837 .0333664 60.62 0.000 1.95744 2.088234 -------------+---------------------------------------------------------------rho | 1 4.55e-12 -1 1 sigma | 7.55974 .2522413 7.081175 8.070648 lambda | 7.55974 .2522413 7.065356 8.054124 -----------------------------------------------------------------------------LR test of indep. eqns. (rho = 0): chi2(1) = 220.09 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------

Results
? Standard OLS, Full sample Betagrades = 4.396 (SE= 0.276) ? Standard OLS, Censored sample Betagrades = 1.813 (SE= 0.275) ? grades, Heckman Selection Model Betagrades = 3.592 (SE= 0.305)