Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
- PMID: 20085642
- PMCID: PMC2824146
- DOI: 10.1186/1471-2288-10-7
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
Abstract
Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.
Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.
Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.
Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR.
Figures








Similar articles
-
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.BMC Med Res Methodol. 2010 Dec 31;10:112. doi: 10.1186/1471-2288-10-112. BMC Med Res Methodol. 2010. PMID: 21194416 Free PMC article.
-
Approaches for missing covariate data in logistic regression with MNAR sensitivity analyses.Biom J. 2020 Jul;62(4):1025-1037. doi: 10.1002/bimj.201900117. Epub 2020 Jan 20. Biom J. 2020. PMID: 31957905
-
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4. BMC Med Res Methodol. 2024. PMID: 39538117 Free PMC article.
-
Imputation of missing covariate in randomized controlled trials with a continuous outcome: Scoping review and new results.Pharm Stat. 2020 Nov;19(6):840-860. doi: 10.1002/pst.2041. Epub 2020 Jun 8. Pharm Stat. 2020. PMID: 32510791 Free PMC article.
-
How are missing data in covariates handled in observational time-to-event studies in oncology? A systematic review.BMC Med Res Methodol. 2020 May 29;20(1):134. doi: 10.1186/s12874-020-01018-7. BMC Med Res Methodol. 2020. PMID: 32471366 Free PMC article.
Cited by
-
Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research.BMC Med Res Methodol. 2012 Dec 5;12:184. doi: 10.1186/1471-2288-12-184. BMC Med Res Methodol. 2012. PMID: 23216665 Free PMC article.
-
Purposeful variable selection and stratification to impute missing Focused Assessment with Sonography for Trauma data in trauma research.J Trauma Acute Care Surg. 2013 Jul;75(1 Suppl 1):S75-81. doi: 10.1097/TA.0b013e31828fa51c. J Trauma Acute Care Surg. 2013. PMID: 23778515 Free PMC article.
-
Clinical predictive models of invasive Candida infection: A systematic literature review.Med Mycol. 2021 Nov 3;59(11):1053-1067. doi: 10.1093/mmy/myab043. Med Mycol. 2021. PMID: 34302351 Free PMC article.
-
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024. Health Data Sci. 2024. PMID: 39635227 Free PMC article. Review.
-
ASA score is an independent predictor of 1-year outcome after moderate-to-severe traumatic brain injury.Scand J Trauma Resusc Emerg Med. 2025 Feb 6;33(1):25. doi: 10.1186/s13049-025-01338-x. Scand J Trauma Resusc Emerg Med. 2025. PMID: 39915823 Free PMC article.
References
-
- Vach W, Blettner M, Armitage P, Colton T. Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998. Missing data in epidemiologic studies; pp. 2641–2654.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous