Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques
- PMID: 37316097
- DOI: 10.1016/j.artmed.2023.102587
Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques
Abstract
Objective: The proper handling of missing values is critical to delivering reliable estimates and decisions, especially in high-stakes fields such as clinical research. In response to the increasing diversity and complexity of data, many researchers have developed deep learning (DL)-based imputation techniques. We conducted a systematic review to evaluate the use of these techniques, with a particular focus on the types of data, intending to assist healthcare researchers from various disciplines in dealing with missing data.
Materials and methods: We searched five databases (MEDLINE, Web of Science, Embase, CINAHL, and Scopus) for articles published prior to February 8, 2023 that described the use of DL-based models for imputation. We examined selected articles from four perspectives: data types, model backbones (i.e., main architectures), imputation strategies, and comparisons with non-DL-based methods. Based on data types, we created an evidence map to illustrate the adoption of DL models.
Results: Out of 1822 articles, a total of 111 were included, of which tabular static data (29%, 32/111) and temporal data (40%, 44/111) were the most frequently investigated. Our findings revealed a discernible pattern in the choice of model backbones and data types, for example, the dominance of autoencoder and recurrent neural networks for tabular temporal data. The discrepancy in imputation strategy usage among data types was also observed. The "integrated" imputation strategy, which solves the imputation task simultaneously with downstream tasks, was most popular for tabular temporal data (52%, 23/44) and multi-modal data (56%, 5/9). Moreover, DL-based imputation methods yielded a higher level of imputation accuracy than non-DL methods in most studies.
Conclusion: The DL-based imputation models are a family of techniques, with diverse network structures. Their designation in healthcare is usually tailored to data types with different characteristics. Although DL-based imputation models may not be superior to conventional approaches across all datasets, it is highly possible for them to achieve satisfactory results for a particular data type or dataset. There are, however, still issues with regard to portability, interpretability, and fairness associated with current DL-based imputation models.
Keywords: Deep learning; Healthcare; Imputation; Missing value; Neural networks.
Copyright © 2023 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest None.
Similar articles
-
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4. Cochrane Database Syst Rev. 2021. Update in: Cochrane Database Syst Rev. 2022 May 23;5:CD011535. doi: 10.1002/14651858.CD011535.pub5. PMID: 33871055 Free PMC article. Updated.
-
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2. Cochrane Database Syst Rev. 2017. Update in: Cochrane Database Syst Rev. 2020 Jan 9;1:CD011535. doi: 10.1002/14651858.CD011535.pub3. PMID: 29271481 Free PMC article. Updated.
-
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948. Health Technol Assess. 2024. PMID: 39367772 Free PMC article.
-
The measurement of collaboration within healthcare settings: a systematic review of measurement properties of instruments.JBI Database System Rev Implement Rep. 2016 Apr;14(4):138-97. doi: 10.11124/JBISRIR-2016-2159. JBI Database System Rev Implement Rep. 2016. PMID: 27532315
-
Measures implemented in the school setting to contain the COVID-19 pandemic.Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029. Cochrane Database Syst Rev. 2022. Update in: Cochrane Database Syst Rev. 2024 May 2;5:CD015029. doi: 10.1002/14651858.CD015029.pub2. PMID: 35037252 Free PMC article. Updated.
Cited by
-
Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods.Sensors (Basel). 2025 Jan 21;25(3):614. doi: 10.3390/s25030614. Sensors (Basel). 2025. PMID: 39943253 Free PMC article.
-
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3. BMC Med Res Methodol. 2025. PMID: 39979819 Free PMC article.
-
Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care.JMIR Mhealth Uhealth. 2024 Sep 27;12:e59587. doi: 10.2196/59587. JMIR Mhealth Uhealth. 2024. PMID: 38626290 Free PMC article.
-
A Comparative Study on Imputation Techniques: Introducing a Transformer Model for Robust and Efficient Handling of Missing EEG Amplitude Data.Bioengineering (Basel). 2024 Jul 23;11(8):740. doi: 10.3390/bioengineering11080740. Bioengineering (Basel). 2024. PMID: 39199698 Free PMC article.
-
Emerging artificial intelligence-driven precision therapies in tumor drug resistance: recent advances, opportunities, and challenges.Mol Cancer. 2025 Apr 23;24(1):123. doi: 10.1186/s12943-025-02321-x. Mol Cancer. 2025. PMID: 40269930 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources