Outcome-sensitive multiple imputation: a simulation study
- PMID: 28068910
- PMCID: PMC5220613
- DOI: 10.1186/s12874-016-0281-5
Outcome-sensitive multiple imputation: a simulation study
Abstract
Background: Multiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the imputation model; the level of protection offered when data are missing not-at-random; the implications of the dataset size and missingness levels.
Methods: We used realistic assumptions to generate thousands of datasets across a broad spectrum of contexts: three mechanisms of missingness (completely at random; at random; not at random); varying extents of missingness (20-80% missing data); and different sample sizes (1,000 or 10,000 cases). For each context we quantified the performance of a complete case analysis and seven multiple imputation methods which deleted cases with missing outcome before imputation, after imputation or not at all; included or did not include the outcome in the imputation models; and included or did not include a secondary outcome in the imputation models. Methods were compared on mean absolute error, bias, coverage and power over 1,000 datasets for each scenario.
Results: Overall, there was very little to separate multiple imputation methods which included the outcome in the imputation model. Even when missingness was quite extensive, all multiple imputation approaches performed well. Incorporating a secondary outcome, moderately correlated with the outcome of interest, made very little difference. The dataset size and the extent of missingness affected performance, as expected. Multiple imputation methods protected less well against missingness not at random, but did offer some protection.
Conclusions: As long as the outcome is included in the imputation model, there are very small performance differences between the possible multiple imputation approaches: no outcome imputation, imputation or imputation and deletion. All informative covariates, even with very high levels of missingness, should be included in the multiple imputation model. Multiple imputation offers some protection against a simple missing not at random mechanism.
Keywords: Imputed outcome; Missing data; Missingness; Multiple imputation.
Figures





Similar articles
-
Dealing with missing delirium assessments in prospective clinical studies of the critically ill: a simulation study and reanalysis of two delirium studies.BMC Med Res Methodol. 2021 May 6;21(1):97. doi: 10.1186/s12874-021-01274-1. BMC Med Res Methodol. 2021. PMID: 33952189 Free PMC article.
-
Missing Data in Orthopaedic Clinical Outcomes Research: A Sensitivity Analysis of Imputation Techniques Utilizing a Large Multicenter Total Shoulder Arthroplasty Database.J Clin Med. 2025 May 29;14(11):3829. doi: 10.3390/jcm14113829. J Clin Med. 2025. PMID: 40507586 Free PMC article.
-
The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24. J Clin Epidemiol. 2024. PMID: 39326470
-
Imputation of missing covariate in randomized controlled trials with a continuous outcome: Scoping review and new results.Pharm Stat. 2020 Nov;19(6):840-860. doi: 10.1002/pst.2041. Epub 2020 Jun 8. Pharm Stat. 2020. PMID: 32510791 Free PMC article.
-
Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions.BMC Med Res Methodol. 2024 Sep 4;24(1):193. doi: 10.1186/s12874-024-02302-6. BMC Med Res Methodol. 2024. PMID: 39232661 Free PMC article.
Cited by
-
Bleeding risk prediction after acute myocardial infarction-integrating cancer data: the updated PRECISE-DAPT cancer score.Eur Heart J. 2024 Sep 7;45(34):3138-3148. doi: 10.1093/eurheartj/ehae463. Eur Heart J. 2024. PMID: 39016180 Free PMC article.
-
A cluster-randomized controlled trial of a nurse-led artificial intelligence assisted prevention and management for delirium (AI-AntiDelirium) on delirium in intensive care unit: Study protocol.PLoS One. 2024 Feb 29;19(2):e0298793. doi: 10.1371/journal.pone.0298793. eCollection 2024. PLoS One. 2024. PMID: 38422003 Free PMC article.
-
Development of Machine Learning Algorithms to Predict Being Lost to Follow-up After Hip Arthroscopy for Femoroacetabular Impingement Syndrome.Arthrosc Sports Med Rehabil. 2020 Sep 22;2(5):e591-e598. doi: 10.1016/j.asmr.2020.07.007. eCollection 2020 Oct. Arthrosc Sports Med Rehabil. 2020. PMID: 33134999 Free PMC article.
-
Missing data in primary care research: importance, implications and approaches.Fam Pract. 2021 Mar 29;38(2):200-203. doi: 10.1093/fampra/cmaa134. Fam Pract. 2021. PMID: 33480404 Free PMC article. No abstract available.
-
Addressing disparities in the long-term mortality risk in individuals with non-ST segment myocardial infarction (NSTEMI) by diabetes mellitus status: a nationwide cohort study.Diabetologia. 2024 Dec;67(12):2711-2725. doi: 10.1007/s00125-024-06281-7. Epub 2024 Oct 3. Diabetologia. 2024. PMID: 39358593 Free PMC article.
References
-
- Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.
-
- Rubin DB. Multiple imputation after 18+ years. J Am Stat Assoc. 1996;91(434):473–89. doi: 10.1080/01621459.1996.10476908. - DOI
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials