Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Sep;19(3):380-97.
doi: 10.1037/a0037416. Epub 2014 Aug 11.

Combating unmeasured confounding in cross-sectional studies: evaluating instrumental-variable and Heckman selection models

Affiliations
Review

Combating unmeasured confounding in cross-sectional studies: evaluating instrumental-variable and Heckman selection models

Alfred DeMaris. Psychol Methods. 2014 Sep.

Abstract

Unmeasured confounding is the principal threat to unbiased estimation of treatment "effects" (i.e., regression parameters for binary regressors) in nonexperimental research. It refers to unmeasured characteristics of individuals that lead them both to be in a particular "treatment" category and to register higher or lower values than others on a response variable. In this article, I introduce readers to 2 econometric techniques designed to control the problem, with a particular emphasis on the Heckman selection model (HSM). Both techniques can be used with only cross-sectional data. Using a Monte Carlo experiment, I compare the performance of instrumental-variable regression (IVR) and HSM to that of ordinary least squares (OLS) under conditions with treatment and unmeasured confounding both present and absent. I find HSM generally to outperform IVR with respect to mean-square-error of treatment estimates, as well as power for detecting either a treatment effect or unobserved confounding. However, both HSM and IVR require a large sample to be fully effective. The use of HSM and IVR in tandem with OLS to untangle unobserved confounding bias in cross-sectional data is further demonstrated with an empirical application. Using data from the 2006-2010 General Social Survey (National Opinion Research Center, 2014), I examine the association between being married and subjective well-being.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Unmeasured confounding in cross-sectional data.
Figure 2
Figure 2
Simulation results for N = 50 with treatment effect present. Sym T = symmetric treatment condition; Asym T = asymmetric treatment condition; Norm E = normal errors; Exp E = exponential errors; Cnfd P = confound present; Cnfd A = confound absent; Inst P = instrument present; Inst A = instrument absent; MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure 3
Figure 3
Simulation results for N = 250 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition; Norm E = normal errors; Exp E = exponential errors; Cnfd P = confound present; Cnfd A = confound absent; Inst P = instrument present; Inst A = instrument absent; MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure 4
Figure 4
Simulation results for N = 2,000 with treatment effect present. Sym T = symmetric treatment condition; Asym T = asymmetric treatment condition; Norm E = normal errors; Exp E = exponential errors; Cnfd P = confound present; Cnfd A = confound absent; Inst P = instrument present; Inst A = instrument absent; MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure 5
Figure 5
Simulation results for N = 50 with treatment effect present. Sym T = symmetric treatment condition; Asym T = asymmetric treatment condition; Norm E = normal errors; Exp E = exponential errors; Cnfd P = confound present; Cnfd A = confound absent; Inst P = instrument present; Inst A = instrument absent; MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model; IVE = endogeneity based on IVR; HME = endogeneity based on HSM.
Figure 6
Figure 6
Simulation results for N = 50 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model; IVE = endogeneity based on IVR; HME = endogeneity based on HSM.
Figure 7
Figure 7
Simulation results for N = 250 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model; IVE = endogeneity based on IVR; HME = endogeneity based on HSM.
Figure 8
Figure 8
Simulation results for N = 250 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model; IVE = endogeneity based on IVR; HME = endogeneity based on HSM.
Figure 9
Figure 9
Simulation results for N = 2,000 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. MSE = mean square error; MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model; IVE = endogeneity based on IVR; HME = endogeneity based on HSM.
Figure 10
Figure 10
Simulation results for N = 2,000 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. MSE = mean square error; MSE = mean square error; OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model; IVE = endogeneity based on IVR; HME = endogeneity based on HSM.
Figure 11
Figure 11
Residuals from ordinary least squares (OLS) regression of life distress on model predictors.

Similar articles

Cited by

References

    1. Allison PD. Missing data. Thousand Oaks, CA: Sage; 2002.
    1. Allison PD. Fixed effects regression methods for longitudinal data using SAS. Cary, NC: SAS Institute; 2005.
    1. Allison PD. Fixed effects regression models. Thousand Oaks, CA: Sage; 2009.
    1. Angrist JD. Lifetime earnings and the Vietnam era draft lottery: Evidence from Social Security administrative records. American Economic Review. 1990;80:313–335.
    1. Angrist JD. National Bureau of Economic Research Technical Working Paper No 115. Cambridge, MA: National Bureau of Economic Research; 1991. Instrumental variables estimation of average treatment effects in econometrics and epidemiology.

Publication types