Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jun;63(5):915-947.
doi: 10.1002/bimj.202000196. Epub 2021 Feb 24.

Missing data: A statistical framework for practice

Affiliations
Review

Missing data: A statistical framework for practice

James R Carpenter et al. Biom J. 2021 Jun.

Abstract

Missing data are ubiquitous in medical research, yet there is still uncertainty over when restricting to the complete records is likely to be acceptable, when more complex methods (e.g. maximum likelihood, multiple imputation and Bayesian methods) should be used, how they relate to each other and the role of sensitivity analysis. This article seeks to address both applied practitioners and researchers interested in a more formal explanation of some of the results. For practitioners, the framework, illustrative examples and code should equip them with a practical approach to address the issues raised by missing data (particularly using multiple imputation), alongside an overview of how the various approaches in the literature relate. In particular, we describe how multiple imputation can be readily used for sensitivity analyses, which are still infrequently performed. For those interested in more formal derivations, we give outline arguments for key results, use simple examples to show how methods relate, and references for full details. The ideas are illustrated with a cohort study, a multi-centre case control study and a randomised clinical trial.

Keywords: complete records; missing data; multiple imputation; sensitivity analysis.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

James Carpenter and Melanie Smuk declared no conflict of interest.

Figures

Figure 1
Figure 1
Complete records analysis of the bed-sharing study: adjusted odds ratio (AOR) showing how the risk of bed-sharing for sudden infant death changes with the baby’s age. Grey lines: risk when mother smokes; black lines: risk for non-smokers. Dashed lines: 95% confidence interval
Figure 2
Figure 2
Framework for addressing issues raised by missing data, when (i) the scientifically substantive model is a generalised linear model of dependent variable Yi on covariates Xi (i = 1,…, n) and (ii) we may also have auxiliary variables, Zi, which are associated with (Yi, Xi), but which are not in the substantive model
Figure 3
Figure 3
NCDS data: panels show how the probability of no qualifications age 23 varies with mother’s age at birth and social housing. In each panel, for children not in care and with a birth weight of 111 ounces, the upper lines are for children who were in social housing and the lower lines for those who were not. Each panel compares the complete records analysis (dashed lines) with those from IPW (top left, largely overlaps complete records); standard MI (top right, 100 imputations); MI separately by social housing, with auxiliary variables (bottom left, 100 imputations); and substantive model compatible MI with auxiliary variables (bottom right, 100 imputations)
Figure 4
Figure 4
Adjusted odds ratio (AOR) for risk of bed-sharing. Left panel: after imputation of missing alcohol and drug data under MAR; right panel: sensitivity analysis. Both panels: solid lines show estimated adjusted odds with non-smoking (solid) and smoking (grey) mother; dashed lines: 95% confidence intervals
Figure 5
Figure 5. NCDS analysis: Comparison of CR (dashed) and MI under MNAR (solid) lines; upper lines: in social housing; lower lines: not in social housing; 100 imputations
Figure 6
Figure 6. Schematic illustration of ‘δ-method’ controlled sensitivity analysis

References

    1. Atkinson A, Kenward MG, Clayton T, Carpenter JR. Reference-based sensitivity analysis for time-to-event data. Pharmaceutical Statistics. 2019;18(6):645–658. - PMC - PubMed
    1. Bartlett JW, Harel O, Carpenter JR. Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. American Journal of Epidemiology. 2015a;182:730–736. - PMC - PubMed
    1. Bartlett JW, Morris T. Multiple imputation of covariates by substantive-model compatible fully conditional specification. Stata Journal. 2015;15:437–456.
    1. Bartlett JW, Seaman S, White IR, Carpenter JR. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research. 2015b;24:462–487. - PMC - PubMed
    1. Bartlett JW, Taylor JMG. Missing covariates in competing risks analysis. Biostatistics. 2016;17(4):751–763. - PMC - PubMed

Publication types

LinkOut - more resources