Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;14(1):31-6.
Epub 2012 Jan 1.

Does the missing data imputation method affect the composition and performance of prognostic models?

Affiliations

Does the missing data imputation method affect the composition and performance of prognostic models?

M R Baneshi et al. Iran Red Crescent Med J. 2012 Jan.

Abstract

Background: We already showed the superiority of imputation of missing data (via Multivariable Imputation via Chained Equations (MICE) method) over exclusion of them; however, the methodology of MICE is complicated. Furthermore, easier imputation methods are available. The aim of this study was to compare them in terms of model composition and performance.

Methods: Three hundreds and ten breast cancer patients were recruited. Four approaches were applied to impute missing data. First we adopted an ad hoc method in which missing data for each variable was replaced by the median of observed values. Then 3 likelihood-based approaches were used. In the regression imputation, a regression model compared the variable with missing data to the rest of the variables. The regression equation was used to fill the missing data. The Expectation Maximum (E-M) algorithm was implemented in which missing data and regression parameters were estimated iteratively until convergence of regression parameters. Finally, the MICE method was applied. Models developed were compared in terms of variables significantly contributed to the multifactorial analysis, sensitivity and specificity.

Results: All candidate variables significantly contributed to the MICE model. However, grade of disease lost its effect in other three models. The MICE model showed the best performance followed by E-M model.

Conclusion: Among imputation methods, final models were not the same, in terms of composition and perform-ance. Therefore, modern imputation methods are recommended to recover the information.

Keywords: Breast cancer; Data; Expectation maximum algorithm; Multivariable imputation via chained equations.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: None declared.

Similar articles

Cited by

References

    1. Burton A, Altman DG. Missing covariate data within cancer prognos­tic studies: a review of current re­porting and proposed guidelines. Br J Cancer. 2004;91:4–8. doi: 10.1038/sj.bjc.6601 907. - DOI - PMC - PubMed
    1. Baneshi MR. Statistical Models in Prognostic Modelling of Many Skewed Variables and Missing Data: A Case Study in Breast Cancer. (PhD thesis submitted at Edinburgh University) 2009
    1. Donner A. The relative effectiveness of procedures commonly used in multiple regression analysis for dealing with missing values. Ameri­can Statisticians. 1982;36:378–81. doi: 10.2307/2683092. - DOI
    1. Croy CD, Novins DK. Methods for addressing missing data in psychiat­ric and developmental research. J Am Acad Child Adolesc Psychiatry. 2005;44:1230–40. doi: 10.1097/01.chi.0000181044.06337.6f. - DOI - PubMed
    1. Donders AR, van der Heijden GJ, Stijnen T, Moons KG. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59:1087–91. doi: 10.1016/j.jclinepi.2006.01.014. - DOI - PubMed

LinkOut - more resources