Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 28;24(1):188.
doi: 10.1186/s12874-024-02310-6.

Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review

Affiliations

Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review

Marziyeh Afkanpour et al. BMC Med Res Methodol. .

Abstract

Background and objectives: Comprehending the research dataset is crucial for obtaining reliable and valid outcomes. Health analysts must have a deep comprehension of the data being analyzed. This comprehension allows them to suggest practical solutions for handling missing data, in a clinical data source. Accurate handling of missing values is critical for producing precise estimates and making informed decisions, especially in crucial areas like clinical research. With data's increasing diversity and complexity, numerous scholars have developed a range of imputation techniques. To address this, we conducted a systematic review to introduce various imputation techniques based on tabular dataset characteristics, including the mechanism, pattern, and ratio of missingness, to identify the most appropriate imputation methods in the healthcare field.

Materials and methods: We searched four information databases namely PubMed, Web of Science, Scopus, and IEEE Xplore, for articles published up to September 20, 2023, that discussed imputation methods for addressing missing values in a clinically structured dataset. Our investigation of selected articles focused on four key aspects: the mechanism, pattern, ratio of missingness, and various imputation strategies. By synthesizing insights from these perspectives, we constructed an evidence map to recommend suitable imputation methods for handling missing values in a tabular dataset.

Results: Out of 2955 articles, 58 were included in the analysis. The findings from the development of the evidence map, based on the structure of the missing values and the types of imputation methods used in the extracted items from these studies, revealed that 45% of the studies employed conventional statistical methods, 31% utilized machine learning and deep learning methods, and 24% applied hybrid imputation techniques for handling missing values.

Conclusion: Considering the structure and characteristics of missing values in a clinical dataset is essential for choosing the most appropriate data imputation technique, especially within conventional statistical methods. Accurately estimating missing values to reflect reality enhances the likelihood of obtaining high-quality and reusable data, contributing significantly to precise medical decision-making processes. Performing this review study creates a guideline for choosing the most appropriate imputation methods in data preprocessing stages to perform analytical processes on structured clinical datasets.

Keywords: Clinical dataset; Imputation methods; Mechanism of missingness; Missing ratio; Missing values; Pattern of missingness; Simulation study.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The flow diagram of Preferred Reporting Items for Systematic Review (PRISMA)
Fig. 2
Fig. 2
Trend of primary studies over the last 8 years
Fig. 3
Fig. 3
Part (a) shows the percentage of primary studies based on their data sources. Part (b) displays the distribution of clinical contexts, emphasizing the importance of managing missing data in medical decision-making processes. Part (c) illustrates the software used for implementing data imputation techniques. Part (d) presents the types of missing variables extracted from each study. In part (e) distribution of missing value patterns is shown. Part (f) depicts the frequency of missing value mechanisms mentioned in the primary studies
Fig. 4
Fig. 4
Main categorization of imputation methods
Fig. 5
Fig. 5
Evidence map: The evidence map illustrates the main categorized imputation methods, which are classified into four groups, and various types of structures for missing values assumed in seven cases ((mechanism, pattern, ratio of missing values), (mechanism, pattern), (mechanism, ratio of missing values), (pattern, ratio of missing values), (mechanism), (pattern), (ratio of missing values))

Similar articles

Cited by

References

    1. Little RJ, Rubin DB. Statistical Analysis with Missing Data, vol. 793. Hoboken, NJ, USA: Wiley; 2019.
    1. Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.10.1093/biomet/63.3.581 - DOI
    1. Galimard JE, Chevret S, Protopopescu C, Resche-Rigon M. A multiple imputation approach for MNAR mechanisms compatible with Heckman’s model. Stat Med. 2016;35(17):2907–20. 10.1002/sim.6902 - DOI - PubMed
    1. Miettinen OS. Theoretical epidemiology: principles of occurrence research in medicine. In Theoretical epidemiology: principles of occurrence research in medicine 1985 (pp. xxii-359).
    1. Humphries M. Missing Data & How to Deal: an overview of missing data. Popul Res Cent. 2013; 45.

Publication types

MeSH terms

LinkOut - more resources