Bridging observational studies and randomized experiments by embedding the former in the latter
- PMID: 29187059
- PMCID: PMC5902671
- DOI: 10.1177/0962280217740609
Bridging observational studies and randomized experiments by embedding the former in the latter
Abstract
Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid p-values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely conceptual stage that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a design stage that approximates a randomized experiment before any outcome data are observed, (3) a statistical analysis stage comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and (4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects. Stages 2 and 3 may rely on modern computing to implement the effort, whereas Stage 1 demands careful scientific argumentation to make the embedding plausible to scientific readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. The conceptually most demanding tasks are often the most scientifically interesting to the dedicated researcher and readers of the resulting statistical analyses. This perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970s.
Keywords: Experimental design; Rubin Causal Model (RCM); causal inference; environmental epidemiology; lung function; observational studies; parental smoking.
Conflict of interest statement
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures
References
-
- Dockery DW, Pope CA, 3rd, Xu X, et al. An association between air pollution and mortality in six U.S. cities. N Engl J Med. 1993;329:1753–1759. - PubMed
-
- Corbo GM, Agabiti N, Pistelli R, et al. Parental smoking and lung function: misclassification due to background exposure to passive smoking. Respir Med. 2007;101:768–773. - PubMed
-
- Holland P. Statistics and causal inference (with discussion) J Am Stat Assoc. 1986;81:945–970.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
