Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review
- PMID: 39198744
- PMCID: PMC11351057
- DOI: 10.1186/s12874-024-02310-6
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review
Abstract
Background and objectives: Comprehending the research dataset is crucial for obtaining reliable and valid outcomes. Health analysts must have a deep comprehension of the data being analyzed. This comprehension allows them to suggest practical solutions for handling missing data, in a clinical data source. Accurate handling of missing values is critical for producing precise estimates and making informed decisions, especially in crucial areas like clinical research. With data's increasing diversity and complexity, numerous scholars have developed a range of imputation techniques. To address this, we conducted a systematic review to introduce various imputation techniques based on tabular dataset characteristics, including the mechanism, pattern, and ratio of missingness, to identify the most appropriate imputation methods in the healthcare field.
Materials and methods: We searched four information databases namely PubMed, Web of Science, Scopus, and IEEE Xplore, for articles published up to September 20, 2023, that discussed imputation methods for addressing missing values in a clinically structured dataset. Our investigation of selected articles focused on four key aspects: the mechanism, pattern, ratio of missingness, and various imputation strategies. By synthesizing insights from these perspectives, we constructed an evidence map to recommend suitable imputation methods for handling missing values in a tabular dataset.
Results: Out of 2955 articles, 58 were included in the analysis. The findings from the development of the evidence map, based on the structure of the missing values and the types of imputation methods used in the extracted items from these studies, revealed that 45% of the studies employed conventional statistical methods, 31% utilized machine learning and deep learning methods, and 24% applied hybrid imputation techniques for handling missing values.
Conclusion: Considering the structure and characteristics of missing values in a clinical dataset is essential for choosing the most appropriate data imputation technique, especially within conventional statistical methods. Accurately estimating missing values to reflect reality enhances the likelihood of obtaining high-quality and reusable data, contributing significantly to precise medical decision-making processes. Performing this review study creates a guideline for choosing the most appropriate imputation methods in data preprocessing stages to perform analytical processes on structured clinical datasets.
Keywords: Clinical dataset; Imputation methods; Mechanism of missingness; Missing ratio; Missing values; Pattern of missingness; Simulation study.
© 2024. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures





Similar articles
-
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3. BMC Med Res Methodol. 2025. PMID: 39979819 Free PMC article.
-
Missing Data in Orthopaedic Clinical Outcomes Research: A Sensitivity Analysis of Imputation Techniques Utilizing a Large Multicenter Total Shoulder Arthroplasty Database.J Clin Med. 2025 May 29;14(11):3829. doi: 10.3390/jcm14113829. J Clin Med. 2025. PMID: 40507586 Free PMC article.
-
Robust imputation method with context-aware voting ensemble model for management of water-quality data.Water Res. 2023 Sep 1;243:120369. doi: 10.1016/j.watres.2023.120369. Epub 2023 Jul 16. Water Res. 2023. PMID: 37499538
-
Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review.J Clin Epidemiol. 2022 Feb;142:218-229. doi: 10.1016/j.jclinepi.2021.11.023. Epub 2021 Nov 16. J Clin Epidemiol. 2022. PMID: 34798287 Review.
-
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024. Health Data Sci. 2024. PMID: 39635227 Free PMC article. Review.
Cited by
-
Spatial analysis of air pollutant exposure and its association with metabolic diseases using machine learning.BMC Public Health. 2025 Mar 1;25(1):831. doi: 10.1186/s12889-025-22077-9. BMC Public Health. 2025. PMID: 40025455 Free PMC article.
-
Dynamic Modeling and System Identification of User Engagement in mHealth Interventions using a Bayesian Approach for Missing Data Imputation.Control Eng Pract. 2025 Nov;164:106460. doi: 10.1016/j.conengprac.2025.106460. Epub 2025 Jun 28. Control Eng Pract. 2025. PMID: 40727918
-
A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications.BMC Med Res Methodol. 2024 Nov 8;24(1):269. doi: 10.1186/s12874-024-02392-2. BMC Med Res Methodol. 2024. PMID: 39516783 Free PMC article.
-
Optimizing in-hospital mortality predictive models in ACS patients: QTc prolongation and machine learning approaches.Egypt Heart J. 2025 Apr 19;77(1):38. doi: 10.1186/s43044-025-00639-x. Egypt Heart J. 2025. PMID: 40252133 Free PMC article. No abstract available.
-
Evaluating predictive performance, validity, and applicability of machine learning models for predicting HIV treatment interruption: a systematic review.BMC Glob Public Health. 2025 Jul 24;3(1):64. doi: 10.1186/s44263-025-00184-4. BMC Glob Public Health. 2025. PMID: 40707983 Free PMC article.
References
-
- Little RJ, Rubin DB. Statistical Analysis with Missing Data, vol. 793. Hoboken, NJ, USA: Wiley; 2019.
-
- Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.10.1093/biomet/63.3.581 - DOI
-
- Miettinen OS. Theoretical epidemiology: principles of occurrence research in medicine. In Theoretical epidemiology: principles of occurrence research in medicine 1985 (pp. xxii-359).
-
- Humphries M. Missing Data & How to Deal: an overview of missing data. Popul Res Cent. 2013; 45.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources