Accounting for missing data in statistical analyses: multiple imputation is not always the answer
- PMID: 30879056
- PMCID: PMC6693809
- DOI: 10.1093/ije/dyz032
Accounting for missing data in statistical analyses: multiple imputation is not always the answer
Abstract
Background: Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations.
Methods: We provide guidance on choice of analysis when data are incomplete. Using causal diagrams to depict missingness mechanisms, we describe when CCA will not be biased by missing data and compare MI and CCA, with respect to bias and efficiency, in a range of missing data situations. We illustrate selection of an appropriate method in practice.
Results: For most regression models, CCA gives unbiased results when the chance of being a complete case does not depend on the outcome after taking the covariates into consideration, which includes situations where data are missing not at random. Consequently, there are situations in which CCA analyses are unbiased while MI analyses, assuming missing at random (MAR), are biased. By contrast MI, unlike CCA, is valid for all MAR situations and has the potential to use information contained in the incomplete cases and auxiliary variables to reduce bias and/or improve precision. For this reason, MI was preferred over CCA in our real data example.
Conclusions: Choice of method for dealing with missing data is crucial for validity of conclusions, and should be based on careful consideration of the reasons for the missing data, missing data patterns and the availability of auxiliary information.
Keywords: Complete case analysis; inverse probability weighting; missing data; missing data mechanisms; missing data patterns; multiple imputation.
© The Author(s) 2019. Published by Oxford University Press on behalf of the International Epidemiological Association.
Figures


Similar articles
-
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4. BMC Med Res Methodol. 2024. PMID: 39538117 Free PMC article.
-
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9. BMC Med Res Methodol. 2024. PMID: 39375597 Free PMC article.
-
Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome.BMC Med Res Methodol. 2022 Apr 3;22(1):87. doi: 10.1186/s12874-021-01495-4. BMC Med Res Methodol. 2022. PMID: 35369860 Free PMC article.
-
Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions.BMC Med Res Methodol. 2024 Sep 4;24(1):193. doi: 10.1186/s12874-024-02302-6. BMC Med Res Methodol. 2024. PMID: 39232661 Free PMC article.
-
Multiple Imputation for Incomplete Data in Environmental Epidemiology Research.Curr Environ Health Rep. 2019 Jun;6(2):62-71. doi: 10.1007/s40572-019-00230-y. Curr Environ Health Rep. 2019. PMID: 31090043 Review.
Cited by
-
The associations between religious/spiritual beliefs and behaviours and study participation in a prospective cohort study (ALSPAC) in Southwest England.Wellcome Open Res. 2024 Jun 27;7:186. doi: 10.12688/wellcomeopenres.17975.2. eCollection 2022. Wellcome Open Res. 2024. PMID: 38989006 Free PMC article.
-
The German version of the mHealth App Usability Questionnaire (GER-MAUQ): Translation and validation study in patients with cardiovascular disease.Digit Health. 2024 Jan 31;10:20552076231225168. doi: 10.1177/20552076231225168. eCollection 2024 Jan-Dec. Digit Health. 2024. PMID: 38303970 Free PMC article.
-
Changes in Internet use patterns among older adults in England from before to after the outbreak of the COVID-19 pandemic.Sci Rep. 2023 Mar 9;13(1):3932. doi: 10.1038/s41598-023-30882-8. Sci Rep. 2023. PMID: 36894600 Free PMC article.
-
Surgeon effects on cataract refractive outcomes are minimal compared with patient comorbidity and gender: an analysis of 490 987 cases.Br J Ophthalmol. 2023 Apr;107(4):488-494. doi: 10.1136/bjophthalmol-2021-320231. Epub 2021 Nov 11. Br J Ophthalmol. 2023. PMID: 34764082 Free PMC article.
-
Assessments of dietary intake and polygenic risk score in associations with colorectal cancer risk: evidence from the UK Biobank.BMC Cancer. 2023 Oct 18;23(1):993. doi: 10.1186/s12885-023-11482-1. BMC Cancer. 2023. PMID: 37853340 Free PMC article.
References
-
- Little RJA, Rubin DB.. Statistical Analysis with Missing Data. 2nd edn. Hoboken, NJ: Wiley, 2002.
-
- Schafer JL, Graham JW.. Missing data: our view of the state of the art. Psychol Methods 2002;7:147–77. - PubMed
-
- Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G.. Handbook of Missing Data Methodology. London: Chapman and Hall/CRC, 2014.
-
- Carpenter JR, Goldstein H, Kenward MG.. REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types. J Stat Softw 2011;45:1–14.
-
- Honaker J, King G, Blackwell M.. Amelia II: a program for missing data. J Stat Softw 2011;45:1–47.