Bridging observational studies and randomized experiments by embedding the former in the latter
- PMID: 29187059
- PMCID: PMC5902671
- DOI: 10.1177/0962280217740609
Bridging observational studies and randomized experiments by embedding the former in the latter
Abstract
Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid p-values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely conceptual stage that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a design stage that approximates a randomized experiment before any outcome data are observed, (3) a statistical analysis stage comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and (4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects. Stages 2 and 3 may rely on modern computing to implement the effort, whereas Stage 1 demands careful scientific argumentation to make the embedding plausible to scientific readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. The conceptually most demanding tasks are often the most scientifically interesting to the dedicated researcher and readers of the resulting statistical analyses. This perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970s.
Keywords: Experimental design; Rubin Causal Model (RCM); causal inference; environmental epidemiology; lung function; observational studies; parental smoking.
Conflict of interest statement
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures











Similar articles
-
Using Bounds to Compare the Strength of Exchangeability Assumptions for Internal and External Validity.Am J Epidemiol. 2019 Jul 1;188(7):1355-1360. doi: 10.1093/aje/kwz060. Am J Epidemiol. 2019. PMID: 30834430 Free PMC article.
-
Combining experimental and observational data through a power likelihood.Biometrics. 2025 Jan 7;81(1):ujaf008. doi: 10.1093/biomtc/ujaf008. Biometrics. 2025. PMID: 39957670
-
Translating questions to estimands in randomized clinical trials with intercurrent events.Stat Med. 2022 Jul 20;41(16):3211-3228. doi: 10.1002/sim.9398. Epub 2022 May 16. Stat Med. 2022. PMID: 35578779 Free PMC article.
-
Using observational data for personalized medicine when clinical trial evidence is limited.Fertil Steril. 2018 Jun;109(6):946-951. doi: 10.1016/j.fertnstert.2018.04.005. Fertil Steril. 2018. PMID: 29935652 Review.
-
A Narrative Review of Methods for Causal Inference and Associated Educational Resources.Qual Manag Health Care. 2020 Oct/Dec;29(4):260-269. doi: 10.1097/QMH.0000000000000276. Qual Manag Health Care. 2020. PMID: 32991545 Review.
Cited by
-
When possible, report a Fisher-exact P value and display its underlying null randomization distribution.Proc Natl Acad Sci U S A. 2020 Aug 11;117(32):19151-19158. doi: 10.1073/pnas.1915454117. Epub 2020 Jul 23. Proc Natl Acad Sci U S A. 2020. PMID: 32703808 Free PMC article.
-
Investigation of Adiposity Measures and Operational Taxonomic unit (OTU) Data Transformation Procedures in Stool Samples from a German Cohort Study Using Machine Learning Algorithms.Microorganisms. 2020 Apr 10;8(4):547. doi: 10.3390/microorganisms8040547. Microorganisms. 2020. PMID: 32290101 Free PMC article.
-
Testing Biased Randomization Assumptions and Quantifying Imperfect Matching and Residual Confounding in Matched Observational Studies.J Comput Graph Stat. 2023;32(2):528-538. doi: 10.1080/10618600.2022.2116447. Epub 2022 Oct 19. J Comput Graph Stat. 2023. PMID: 37334200 Free PMC article.
-
Researching COVID to enhance recovery (RECOVER) tissue pathology study protocol: Rationale, objectives, and design.PLoS One. 2024 Jan 10;19(1):e0285645. doi: 10.1371/journal.pone.0285645. eCollection 2024. PLoS One. 2024. PMID: 38198481 Free PMC article.
-
Causal Isotonic Regression.J R Stat Soc Series B Stat Methodol. 2020 Jul;82(3):719-747. doi: 10.1111/rssb.12372. Epub 2020 May 13. J R Stat Soc Series B Stat Methodol. 2020. PMID: 33986625 Free PMC article.
References
-
- Dockery DW, Pope CA, 3rd, Xu X, et al. An association between air pollution and mortality in six U.S. cities. N Engl J Med. 1993;329:1753–1759. - PubMed
-
- Corbo GM, Agabiti N, Pistelli R, et al. Parental smoking and lung function: misclassification due to background exposure to passive smoking. Respir Med. 2007;101:768–773. - PubMed
-
- Holland P. Statistics and causal inference (with discussion) J Am Stat Assoc. 1986;81:945–970.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources