87994.com

学习资料共享网 文档搜索专家

学习资料共享网 文档搜索专家

Heckman Selection Models

Graduate Methods Master Class March 4th, 2005 34 Kirkland Street, Room 22 Dan Hopkins (with many thanks to MIT’s Adam Berinsky for generously sharing slides from his 2003 ICPSR Course, “Advanced MLE: Methods of Analyzing Censored, Sample Selected, and Truncated Data”)

Introduction

? The majority of models in political science make some form of Imbens’ (2004) exogeneity/ unconfoundedness assumption: systematic differences in treated and control units with the same values for the covariates are attributed to the treatment ? But… Achen (1986) identifies two common and thorny challenges to the unconfoundedness assumption: 1) non-random assignment to treatment and 2) sample selection/ censoring

Introduction (continued)

? The Heckman models I will present are designed to deal with sample selection, but the same approach can be used to deal with non-random assignment to treatment as well (e.g. von Stein forthcoming) ? Selection bias can be thought of as a form of omitted variable bias (Heckman 1979)

Typology (from Berinsky/Breene)

Sample Censored Y Variable y is known exactly only if some criterion defined. in terms of y is met. X Variable x variables are observed for the entire sample, regardless of whether y is observed exactly Example Determinants of income; income is measured exactly only if it above the poverty line. All other incomes are reported at the poverty line Survey data with item or unit nonresponse

Sample Selected

y is observed only if a criteria defined. in terms of some other random variable (Z) is met.

x and w (the determinants of whether Z =1) are observed for the entire sample, regardless of whether y is observed or not x variables are observed only if y is observed.

Truncated

y is known only if some criterion defined in terms of y is met.

Donations to political campaigns.

Sample Selection: Intuition

? Non-random selection – The inference may not extend to the unobserved group ? EX> Suppose we observe that college grades are uncorrelated with success in graduate school ? Can we infer that college grades are irrelevant? ? No: applicants admitted with low grades may not be representative of the population with low grades ? Unmeasured variables (e.g. motivation) used in the admissions process might explain why those who enter graduate school with low grades do as well as those who enter graduate school with high grades

Thinking about this Formally

SELECTION EQUATION ? zi* = latent variable, DV of selection equation; think of this as the propensity to be included in the sample ? wi’ = vector of covariates for unit i for selection equation ? = vector of coefficients for selection equation ? i = random disturbance for unit i for selection equation ? zi* = wi’ + i OUTCOME EQUATION ? yi= DV of outcome equation ? xi’ = vector of covariates for unit i for outcome equation ? = vector of coefficients for outcome equation ? ui = random disturbance for unit i for outcome equation ? yi = xi’ + ui

Can’t we just include the selection factors in the outcome equation?

? If there are no unmeasured variables that predict selection into the sample, we can (i.e. deterministic sample selection) ? If selection into the sample is random, we can (logic behind population inferences from telephone surveys)

Why can’t we just use explanatory variables in the outcome equation?

? What about if we cannot predict selection perfectly? ? 12 = Cov(ui, i) ? s = the unexplained variance in the assignment variable z when regressed on exogenous variables in the outcome equation X ? Inconsistency in treatment effect = (from Achen 1986) 12 / s ? Adding variables to the outcome equation might decrease s without necessarily decreasing 12 ? Hence using explanatory variables in the outcome equation could exacerbate the problem

Achen’s Warning

“With quasi-experimental data derived from nonrandomized assignments, controlling for additional variables in a regression may worsen the estimate of the treatment effect, even when the additional variables improve the specification.” —Achen, 1986, page 27

Heckman Model (from Berinsky’s slides)

? Relationship of interest is a simple linear model

yi

xi

u

Outcome Equation

? Assume that Y is observed iff a second, unobserved latent variable exceeds a particular threshold

z

? Looks like a probit

* i

wi 1 if z

ei ;

* i

zi

0;

0 otherwise

Selection Equation

Pr z i

1

wi

Heckman Models: Likelihood Function

? Further assume Y, Z have bivariate normal distribution with correlation coefficient ? So the MLE (again, from Berinsky) is:

Ln L

z 0

Ln 1 -

wi

z 1

Ln 2

1 2 u yi xi u 2

z 12

1 2 u

yi

xi

2

wi

z 1

Ln 1

Downsides of the Heckman Selection Model

? Need an exclusion restriction/instrument or model is identified solely on distributional assumptions (Sartori 2003; Liao 1995) ? Very sensitive to assumption of bivariate normality (Winship and Mare 1992) ? parameter very sensitive in some common applications (Brandt and Schneider 2004; Sartori 2003) ? For instance, Sartori (2003) replicates Lemke and Reed, finds the 95% confidence interval is from = -.999999 to +0.99255

Extensions

? Can be modified so that dependent variable in outcome equation is binary (Heckman probit, the below is drawn from Berinsky)

1

Ln L

,

2

,

y 2 1, y1 1

ln ln

2

1

x i1 ,

1

2

xi2 ,

2

y 2 1, y1 0

2

x i1 ,

xi2 ,

y2

0

ln

2

xi2

W here: Y i1 ~ f bern (y 1i | 1i), 1i defined by the underlying probability term Yi 1 x i 1 u 1i is the outcom e process, Y i2 ~ f bern (y 2i | 2i), 2i defined by the underlying probability term Yi 2 x i 2 u i 2 , is the selection process y 1i =0 and y 2i =1 is an untruncated failure, y 1i =1 and y 2i =1 is an untruncated success, y 2i =0 is a truncated observation.

2 1

x1 ,

2

x2 ,

is the cum ulative bivariate norm al function defined by

1

x1 ,

x 2 and ; and u 1i and u 2i are bivariate norm ally distributed iid, w ith

2

u 1,u 2

.

Example: An Admissions Committee

? Let’s say we are interested in making inferences about the relationship between college grades and success in graduate school for the population of college students. ? Further assume that the admissions committee is quite good at what it does, and it uses both its estimates of people’s success (which are quite accurate, though not perfect) as well as some factor exogenous to success in graduate school (say, legacy admissions) ? We as data analysts have access to college grades, admission information, legacy admissions, and success in graduate school for those who were admitted. We do not observe success for those who were not admitted.

Example: An Admissions Committee (Continued)

? I generated a dataset that fits the description above. ? Because I generated the dataset, I know the truth, even if I will hide the truncated information from my estimators ? The correlation in the full sample between grades and success is 0.47. In the truncated sample, it is just 0.17.

R Code for Example 1

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? setwd( "C:/Documents and Settings/labguest/Desktop") ###EXAMPLE n <- 1000 ##VARIABLES grades motivation sigma <- diag(2) sigma[1,1] <-.75 sigma[sigma==0] <- .25 library(MASS) data <- mvrnorm(n, c(2,0),sigma) success <- 2*data[,1] + 8*data[,2] + rnorm(n,1,.25) randomad <- rbinom(100,30,.4) admitted <- 1*((success + randomad) > (mean(success) + mean(randomad))) data <- cbind(success,admitted,data[,1],data[,2],randomad) colnames(data) <- c("success","admitted","grades","motivation","randomad") df1 <- data.frame(data) df1$success2 <- 1*(df1$success > quantile(df1$success,.6)) round(cor(df1),digits=3)

R Code for Example 2

# success admit grades motivation instrument success2 #success 1.000 0.779 0.468 0.982 0.029 0.791 #admitted 0.779 1.000 0.356 0.766 0.233 0.759 #grades 0.468 0.356 1.000 0.295 0.053 0.356 #motivation 0.982 0.766 0.295 1.000 0.021 0.780 #randomad 0.029 0.233 0.053 0.021 1.000 0.016 #success2 0.791 0.759 0.356 0.780 0.016 1.000 df2 <- df1[df1$admitted==1,] round(cor(df2$grades,df2$success2),digits=3) #[1] 0.173 df1$success3 <- NA df1$success3[df1$admitted==1] <- df1$success[df1$admitted==1] write.table(df1,file="hecktest.dat",sep=",",na=".",row.names=F)

. heckman success3 grades, sel(grades randomad) Iteration 0: log likelihood = -2130.1572 Iteration 32: log likelihood = -2035.3218

Stata Results: An Admissions Committee, Heckman Model

1000 = 498 Wald chi2(1) = 138.79 Prob > chi2 =

Heckman selection model Number of obs = (regression model with sample selection) Censored obs Uncensored obs = 502

Log likelihood = -2035.322

0.0000

-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------f7 | grades | 3.592449 .3049346 11.78 0.000 2.994788 4.19011 _cons | -.7373959 .5827337 -1.27 0.206 -1.879533 .4047411 -------------+---------------------------------------------------------------select | grades | .475208 .0415684 11.43 0.000 .3937355 .5566806 randomad | .1322797 .0044137 29.97 0.000 .123629 .1409304 _cons | -2.214016 .090714 -24.41 0.000 -2.391812 -2.03622 -------------+---------------------------------------------------------------/athrho | 15.60179 40.50948 0.39 0.700 -63.79532 94.99891 /lnsigma | 2.022837 .0333664 60.62 0.000 1.95744 2.088234 -------------+---------------------------------------------------------------rho | 1 4.55e-12 -1 1 sigma | 7.55974 .2522413 7.081175 8.070648 lambda | 7.55974 .2522413 7.065356 8.054124 -----------------------------------------------------------------------------LR test of indep. eqns. (rho = 0): chi2(1) = 220.09 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------

Results

? Standard OLS, Full sample Betagrades = 4.396 (SE= 0.276) ? Standard OLS, Censored sample Betagrades = 1.813 (SE= 0.275) ? grades, Heckman Selection Model Betagrades = 3.592 (SE= 0.305)

For More Information…

Achen, Christopher H. 1986. “The Statistical Analysis of QuasiExperiments.” Berkeley, CA: University of California Press Heckman, James J. 1979. “Sample Selection Bias as a Specification Error.” Econometrica 47(1): 153-161. Sartori, Anne E. 2003. An Estimator for Some Binary-Outcome Selection Models Without Exclusion Restrictions. Political Analysis 11:111-138. Winship, Christopher, and Robert D Mare. 1992. Models for Sample Selection Bias. Annual Review of Sociology 18:327-50.

赞助商链接

相关文章:

更多相关标签:

- HECKMAN
- Heckman两步修正
- 制度水平与双边股权资本流动——基于Heckman两阶段模型的分析
- Sample Selection Bias as a Specification Error.pdf (by James J.Heckman)
- 4.5 选择性样本模型
- A two-stage model for content determination
- A Two-Stage Equilibrium Model of the European Cross-Border Trade Regulation
- Trust in e-commerce vendors a two-stage model
- A Two-Stage Vehicle Routing Model for Large-Scale Bioterrorism Emergencies
- Lee.Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model with Autoregr
- Qualitative analysis of an SIR epidemic model with stage structure1
- An integrated two-stage optimization model for the development of
- An impulsive two-stage predator–prey model with stage-structure
- Pipeline ADC Single Stage Model
- A Model to Predict Survival in Patients With End-Stage Liver Disease