Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul:40:31-53.
doi: 10.1146/annurev-soc-071913-043455. Epub 2014 Jun 2.

Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable

Affiliations

Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable

Felix Elwert et al. Annu Rev Sociol. 2014 Jul.

Abstract

Endogenous selection bias is a central problem for causal inference. Recognizing the problem, however, can be difficult in practice. This article introduces a purely graphical way of characterizing endogenous selection bias and of understanding its consequences (Hernán et al. 2004). We use causal graphs (direct acyclic graphs, or DAGs) to highlight that endogenous selection bias stems from conditioning (e.g., controlling, stratifying, or selecting) on a so-called collider variable, i.e., a variable that is itself caused by two other variables, one that is (or is associated with) the treatment and another that is (or is associated with) the outcome. Endogenous selection bias can result from direct conditioning on the outcome variable, a post-outcome variable, a post-treatment variable, and even a pre-treatment variable. We highlight the difference between endogenous selection bias, common-cause confounding, and overcontrol bias and discuss numerous examples from social stratification, cultural sociology, social network analysis, political sociology, social demography, and the sociology of education.

Keywords: causality; confounding; directed acyclic graphs; identification; selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A directed acyclic graph (DAG).
Figure 2
Figure 2
(a) A and B are associated by causation. The marginal association between A and B identifies the causal effect of A on B. (b) A and B are conditionally independent given C. The conditional association between A and B given C does not identify the causal effect of A on B (overcontrol bias).
Figure 3
Figure 3
(a) A and B are associated by common cause. The marginal association does not identify the causal effect of A on B (confounding bias). (b) A and B are conditionally independent given C. The conditional association does identify the causal effect of A on B (which is zero in this model).
Figure 4
Figure 4
(a) A and B are marginally independent. The marginal association identifies the causal effect of A on B (which is zero in this model). (b) A and B are associated conditionally on their common outcome, C (collider). The conditional association between A and B given C does not identify the causal effect of A on B (endogenous selection bias). (c) Conditioning on a descendant, D, of a collider, C, similarly induces an association between the causes of the collider. The conditional association between A and B given D does not identify the causal effect of A on B.
Figure 5
Figure 5
Endogenous selection bias due to outcome truncation, with E as education (treatment), I as income (outcome, truncated at 1.5 times poverty threshold), and U as error term on income.
Figure 6
Figure 6
Endogenous selection bias due to listwise deletion [I, father’s income (treatment); P, child support payments (outcome); R, survey response]. Conditioning on the post-outcome variable response behavior R (listwise deletion of missing data) induces a noncausal association between father’s income, I, and his child support payments, P.
Figure 7
Figure 7
Endogenous selection bias due to sample selection [B, topping the Billboard charts (treatment); R, inclusion in the Rolling Stone 500 (outcome); S, sample selection].
Figure 8
Figure 8
Endogenous selection bias due to sample selection [M, motherhood (treatment); WR, unobserved reservation wage; WO, offer wage (outcome); E, employment; ε, error term on offer wage]. (a) Null model without effect of motherhood on offer wages. (b) Model with effect of motherhood on offer wages.
Figure 9
Figure 9
Attrition and dependent/informative censoring in panel studies [P, poverty (treatment); D, divorce (outcome); C, censoring/attrition; U, unmeasured marital distress]. (a) Censoring is random with respect to poverty and divorce. (b) Censoring is affected by poverty. (c) Censoring is affected by treatment and shares a common cause with the outcome. Only in panel c does attrition lead to endogenous selection bias.
Figure 10
Figure 10
Proxy control [S, schooling (treatment); W, wages (outcome); U, unmeasured ability; Q, test scores]. (a) U confounds the effect of S on W. (b) Q is a valid proxy for U; conditioning on Q reduces, but does not eliminate, confounding bias. (c) Q is affected by S and U; conditioning on Q induces endogenous selection bias. (d) Q is affected by S and affects W; conditioning on Q induces overcontrol bias.
Figure 11
Figure 11
Endogenous selection bias in mediation analysis [T, class size (randomized treatment); M, student achievement; Y, high school graduation (outcome); U, unobserved factors such as student motivation]. (a) M mediates the indirect effect of T on Y. (b) M is not a mediator.
Figure 12
Figure 12
Endogenous selection bias due to latent homophily in social network analysis (i, j, index for a dyad of individuals; Y, civic engagement; U, altruism; F, friendship tie).
Figure 13
Figure 13
Confounding cannot be reduced to a purely associational criterion. If U1 and U2 are unobserved, the DAGs in panels a, b, and c are observationally indistinguishable. (a) X is a common cause (confounder) of the treatment, T, and the outcome, Y; conditioning on X removes confounding bias and identifies the causal effect of T on Y. (b) X is a pre-treatment collider on a noncausal path linking treatment and outcome; conditioning on X induces endogenous selection bias. (c) X is both a confounder and a collider; neither conditioning nor not conditioning on X identifies the causal effect of T on Y.

References

    1. Alderman H, Behrman J, Kohler H, Maluccio JA, Watkins SC. Attrition in longitudinal household survey data: some tests for three developing-country samples. Demogr Res. 2001;5:79–124.
    1. Allen MP, Lincoln A. Critical discourse and the cultural consecration of American films. Soc Forces. 2004;82(3):871–94.
    1. Alwin DH, Hauser RM. The decomposition of effects in path analysis. Am Sociol Rev. 1975;40:37–47.
    1. Amin V. Returns to education: evidence from UK twins: comment. Am Econ Rev. 2011;101(4):1629–35.
    1. Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;8:328–36.

LinkOut - more resources