Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 25;10(9):e0139210.
doi: 10.1371/journal.pone.0139210. eCollection 2015.

Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa

Affiliations

Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa

Katya L Masconi et al. PLoS One. .

Abstract

Background: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.

Methods: Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models' discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.

Results: The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.

Conclusions: Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Histogram showing the proportion of missing for each variable.
*BMI, Body Mass Index; WC, Waist Circumference; SBP, Systolic Blood Pressure; DBP, Diastolic Blood Pressure; FH, Family History; Cort, Corticosteroids; med, medication; Hpt, Hypertensive.
Fig 2
Fig 2. Aggregation plot showing all combinations of missing (red) and non-missing (blue) values in the variables, from the highest to lowest frequency.
*BMI, Body Mass Index; WC, Waist Circumference; SBP, Systolic Blood Pressure; DBP, Diastolic Blood Pressure; FH, Family History; Cort, Corticosteroids; med, medication; Hpt, Hypertensive.

References

    1. Donders ART, van der Heijden GJ, Stijnen T, Moons KG. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087–91. - PubMed
    1. Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychological methods. 2002;7(2):147 - PubMed
    1. Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
    1. Greenland S, Finkle WD. A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995;142(12):1255–64. - PubMed
    1. Little RJ. Regression with missing X's: a review. Journal of the American Statistical Association. 1992;87(420):1227–37.

Publication types