Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul;28(7):1958-1978.
doi: 10.1177/0962280217740609. Epub 2017 Nov 29.

Bridging observational studies and randomized experiments by embedding the former in the latter

Affiliations

Bridging observational studies and randomized experiments by embedding the former in the latter

Marie-Abele C Bind et al. Stat Methods Med Res. 2019 Jul.

Abstract

Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid p-values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely conceptual stage that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a design stage that approximates a randomized experiment before any outcome data are observed, (3) a statistical analysis stage comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and (4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects. Stages 2 and 3 may rely on modern computing to implement the effort, whereas Stage 1 demands careful scientific argumentation to make the embedding plausible to scientific readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. The conceptually most demanding tasks are often the most scientifically interesting to the dedicated researcher and readers of the resulting statistical analyses. This perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970s.

Keywords: Experimental design; Rubin Causal Model (RCM); causal inference; environmental epidemiology; lung function; observational studies; parental smoking.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1
Figure 1
Trimming approach with rectangle boundaries for age and height.
Figure 2
Figure 2
Trimming with ellipsoidal boundaries for age and height.
Figure 3
Figure 3
Propensity score distributions among the exposed (black curves) and non-exposed (grey curves) children before (top plot) and after (bottom plot) removing the outlier “units” [we removed “outlier” units, i.e. 154 non-exposed children had a propensity score below the minimum propensity score among the exposed children and two exposed children had a propensity score above the maximum propensity score among the unexposed children].
Figure 4
Figure 4
Standardized mean differences for the variables age, height, sex, age2, height2, sex × age, and sex × height for the non-exposed vs. exposed children before matching (black dots), after propensity score matching (d) (darker grey triangles), and after optimal pair matching (e) (lighter grey diamonds) (“Love” plots).
Figure 5
Figure 5
Empirical distributions of the variables age among non-exposed (left panels) and exposed (right panels) children in the original dataset (a) (top panels), after propensity score matching (d) (middle panels), and after optimal pair matching (e) (bottom panels) [Kolmogorov–Smirnov ’distances’ for: (1) the difference in age distributions of the non-exposed vs. exposed children in the original dataset (a) = 0.56, (2) the difference in age distributions of the non-exposed vs. exposed children after propensity score matching (d) = 0.10, (3) the difference in age distributions of the non-exposed vs. exposed children after optimal pair matching (e) = 0.06].
Figure 6
Figure 6
Empirical distributions of the variables height among non-exposed (left panels) and exposed (right panels) children in the original dataset (a) (top panels), after propensity score matching (d) (middle panels), and after optimal pair matching (e) (bottom panels) [Kolmogorov–Smirnov ’distances’ for: (1) the difference in height distributions of the non-exposed vs. exposed children in the original dataset (a) = 0.47, (2) the difference in height distributions of the non-exposed vs. exposed children after propensity score matching (d) = 0.16, (3) the difference in height distributions of the non-exposed vs. exposed children after optimal pair matching (e) = 0.05].
Figure 7
Figure 7
Distribution of the squared Mahalanobis distances between propensity score (d) and optimal (e) matched pairs.
Figure 8
Figure 8
Pairwise squared Mahalanobis distances between propensity score matched pairs (d) versus the estimated paired causal effects (d).
Figure 9
Figure 9
Approximate null randomization distributions of t-statistics under the reconstructed randomized experiments (Tt-completely randomized D.1, Tt-rerandomized D.2, and Tt-paired-randomized E) and observed t-statistics (Tobst-completely randomized D.1, Tobst-rerandomized D.2, and Tobst-paired-randomized E) [Randomization-based p-valuecompletely randomized D.1 = 0.12, Tobst-completely randomized D.1 = 1.57, and 95% Fiducial interval completely randomized D.1 = −0.52 to 0.06, Randomization-based p-valuererandomized D.2 = 0.10, Tobst-rerandomized D.2 = 1.66, and 95% Fiducial intervalrerandomized D.2 = −0.33 to 0.03, and Randomization-based p-valuepaired randomized E = 0.04, Tobst-paired-randomized E = 2.12, and 95% Fiducial intervalpaired randomized E = −0.37 to −0.02].
Figure 10
Figure 10
Estimated distributions and posterior means of the average causal effect (ACE) in the propensity score matched (d) [mean: −0.16 and 95% posterior interval: −0.29; −0.03] and optimal paired (e) [mean: −0.18 and 95% posterior interval: −0.30; −0.06] data sets.
Figure 11
Figure 11
Approximate null randomization distributions of t-statistics under the reconstructed randomized experiments (Tt-completely randomized D.1 and Bayesian, Tt-rerandomized D.2 and Bayesian, and Tt-paired-randomized E and Bayesian) and observed tstatistics (Tobs t-completely randomized D.1 and Bayesian, Tobs t-rerandomized D.2 and Bayesian, and Tobs t-paired-randomized E and Bayesian) [Randomization-based p-valuecompletely randomized D.1 and Bayesian = 0.09, Tobs t-completely randomized D.1 and Bayesian = 2.39, Randomization-based p-valuererandomized D.2 = 0.10, Tobs t-rerandomized D.2 = 2.31, Randomization-based p-valuepaired randomized E = 0.04, and Tobs t-paired-randomized E = 2.84].

Similar articles

Cited by

References

    1. Dockery DW, Pope CA, 3rd, Xu X, et al. An association between air pollution and mortality in six U.S. cities. N Engl J Med. 1993;329:1753–1759. - PubMed
    1. Bell ML, Peng RD, Dominici F. The exposure-response curve for ozone and risk of mortality and the adequacy of current ozone regulations. Environ Health Perspect. 2006;114:532–536. - PMC - PubMed
    1. Schwartz J. Air pollution and blood markers of cardiovascular risk. Environ Health Perspect. 2001;109:405–409. - PMC - PubMed
    1. Corbo GM, Agabiti N, Pistelli R, et al. Parental smoking and lung function: misclassification due to background exposure to passive smoking. Respir Med. 2007;101:768–773. - PubMed
    1. Holland P. Statistics and causal inference (with discussion) J Am Stat Assoc. 1986;81:945–970.

Publication types

LinkOut - more resources