. 2019 Jul;28(7):1958-1978.

doi: 10.1177/0962280217740609. Epub 2017 Nov 29.

Bridging observational studies and randomized experiments by embedding the former in the latter

Marie-Abele C Bind¹, Donald B Rubin¹

Affiliations

PMID: 29187059
PMCID: PMC5902671
DOI: 10.1177/0962280217740609

Bridging observational studies and randomized experiments by embedding the former in the latter

Marie-Abele C Bind et al. Stat Methods Med Res. 2019 Jul.

. 2019 Jul;28(7):1958-1978.

doi: 10.1177/0962280217740609. Epub 2017 Nov 29.

Authors

Marie-Abele C Bind¹, Donald B Rubin¹

Affiliation

¹ Faculty of Arts and Sciences, Department of Statistics, Harvard University, Cambridge, MA, USA.

PMID: 29187059
PMCID: PMC5902671
DOI: 10.1177/0962280217740609

Abstract

Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid p-values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely conceptual stage that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a design stage that approximates a randomized experiment before any outcome data are observed, (3) a statistical analysis stage comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and (4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects. Stages 2 and 3 may rely on modern computing to implement the effort, whereas Stage 1 demands careful scientific argumentation to make the embedding plausible to scientific readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. The conceptually most demanding tasks are often the most scientifically interesting to the dedicated researcher and readers of the resulting statistical analyses. This perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970s.

Keywords: Experimental design; Rubin Causal Model (RCM); causal inference; environmental epidemiology; lung function; observational studies; parental smoking.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1**
Trimming approach with rectangle boundaries for age and height.

**Figure 2**
Trimming with ellipsoidal boundaries for age and height.

**Figure 3**
Propensity score distributions among the exposed (black curves) and non-exposed (grey curves) children before (top plot) and after (bottom plot) removing the outlier “units” [we removed “outlier” units, i.e. 154 non-exposed children had a propensity score below the minimum propensity score among the exposed children and two exposed children had a propensity score above the maximum propensity score among the unexposed children].

**Figure 4**
Standardized mean differences for the variables age, height, sex, age², height², sex × age, and sex × height for the non-exposed vs. exposed children before matching (black dots), after propensity score matching (d) (darker grey triangles), and after optimal pair matching (e) (lighter grey diamonds) (“Love” plots).

**Figure 5**
Empirical distributions of the variables age among non-exposed (left panels) and exposed (right panels) children in the original dataset (a) (top panels), after propensity score matching (d) (middle panels), and after optimal pair matching (e) (bottom panels) [Kolmogorov–Smirnov ’distances’ for: (1) the difference in age distributions of the non-exposed vs. exposed children in the original dataset (a) = 0.56, (2) the difference in age distributions of the non-exposed vs. exposed children after propensity score matching (d) = 0.10, (3) the difference in age distributions of the non-exposed vs. exposed children after optimal pair matching (e) = 0.06].

**Figure 6**
Empirical distributions of the variables height among non-exposed (left panels) and exposed (right panels) children in the original dataset (a) (top panels), after propensity score matching (d) (middle panels), and after optimal pair matching (e) (bottom panels) [Kolmogorov–Smirnov ’distances’ for: (1) the difference in height distributions of the non-exposed vs. exposed children in the original dataset (a) = 0.47, (2) the difference in height distributions of the non-exposed vs. exposed children after propensity score matching (d) = 0.16, (3) the difference in height distributions of the non-exposed vs. exposed children after optimal pair matching (e) = 0.05].

**Figure 7**
Distribution of the squared Mahalanobis distances between propensity score (d) and optimal (e) matched pairs.

**Figure 8**
Pairwise squared Mahalanobis distances between propensity score matched pairs (d) versus the estimated paired causal effects (d).

**Figure 9**
Approximate null randomization distributions of t-statistics under the reconstructed randomized experiments (T_{t-completely randomized D.1}, T_{t-rerandomized D.2}, and T_{t-paired-randomized E}) and observed t-statistics (T^obs_{t-completely randomized D.1}, T^obs_{t-rerandomized D.2}, and T^obs_{t-paired-randomized E}) [Randomization-based p-value_{completely randomized D.1} = 0.12, T^obs_{t-completely randomized D.1} = 1.57, and 95% Fiducial interval _{completely randomized D.1} = −0.52 to 0.06, Randomization-based p-value_{rerandomized D.2} = 0.10, T^obs_{t-rerandomized D.2} = 1.66, and 95% Fiducial interval_{rerandomized D.2} = −0.33 to 0.03, and Randomization-based p-value_{paired randomized E} = 0.04, T^obs_{t-paired-randomized E} = 2.12, and 95% Fiducial interval_{paired randomized E} = −0.37 to −0.02].

**Figure 10**
Estimated distributions and posterior means of the average causal effect (ACE) in the propensity score matched (d) [mean: −0.16 and 95% posterior interval: −0.29; −0.03] and optimal paired (e) [mean: −0.18 and 95% posterior interval: −0.30; −0.06] data sets.

**Figure 11**
Approximate null randomization distributions of t-statistics under the reconstructed randomized experiments (Tt-completely randomized D.1 and Bayesian, Tt-rerandomized D.2 and Bayesian, and Tt-paired-randomized E and Bayesian) and observed tstatistics (Tobs t-completely randomized D.1 and Bayesian, Tobs t-rerandomized D.2 and Bayesian, and Tobs t-paired-randomized E and Bayesian) [Randomization-based p-valuecompletely randomized D.1 and Bayesian = 0.09, Tobs t-completely randomized D.1 and Bayesian = 2.39, Randomization-based p-valuererandomized D.2 = 0.10, Tobs t-rerandomized D.2 = 2.31, Randomization-based p-valuepaired randomized E = 0.04, and Tobs t-paired-randomized E = 2.84].

See this image and copyright information in PMC

References

1. Dockery DW, Pope CA, 3rd, Xu X, et al. An association between air pollution and mortality in six U.S. cities. N Engl J Med. 1993;329:1753–1759. - PubMed
1. Bell ML, Peng RD, Dominici F. The exposure-response curve for ozone and risk of mortality and the adequacy of current ozone regulations. Environ Health Perspect. 2006;114:532–536. - PMC - PubMed
1. Schwartz J. Air pollution and blood markers of cardiovascular risk. Environ Health Perspect. 2001;109:405–409. - PMC - PubMed
1. Corbo GM, Agabiti N, Pistelli R, et al. Parental smoking and lung function: misclassification due to background exposure to passive smoking. Respir Med. 2007;101:768–773. - PubMed
1. Holland P. Statistics and causal inference (with discussion) J Am Stat Assoc. 1986;81:945–970.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bridging observational studies and randomized experiments by embedding the former in the latter

Affiliation

Bridging observational studies and randomized experiments by embedding the former in the latter

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources