The impact of misclassifications and outliers on imputation methods
- PMID: 39450101
- PMCID: PMC11500630
- DOI: 10.1080/02664763.2024.2325969
The impact of misclassifications and outliers on imputation methods
Abstract
Many imputation methods have been developed over the years and tested mostly under ideal settings. Surprisingly, there is no detailed research on how imputation methods perform when the idealized assumptions about the distribution of data and/or model assumptions are partly not fulfilled. This research looks into the susceptibility of imputation techniques, particularly in relation to outliers, misclassifications, and incorrect model specifications. This is crucial knowledge about how well the methods convince in everyday life because, in reality, conditions are usually not ideal, and model assumptions may not hold. The data may not fit the defined models well. Outliers distort the estimates, and misclassifications reduce the quality of most imputation methods. Several different evaluation measures are discussed, from comparing imputed values with true values or comparing certain statistics, from the performance of classifiers to the variance of estimated parameters. Some well-known imputation methods are compared based on real data and simulations. It turns out that robust conditional imputation methods outperform other methods for real data and simulation settings.
Keywords: 62-08; Missing values; imputation; misclassifications; outliers; robust methods; simulation.
© 2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
Conflict of interest statement
No potential conflict of interest was reported by the author(s).
Figures












References
-
- Béguin C. and Hulliger B., The BACON-EEM algorithm for multivariate outlier detection in incomplete survey data, Surv. Methodol. 34 (2008), pp. 91–103.
-
- Belin T.R., Hu M.Y., Young A.S., and Grusky O., Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study, Stat. Med. 18 (1999), pp. 3123–3135. - PubMed
-
- Bertsimas D., Pawlowski C., and Zhuo Y.D., From predictive methods to missing data imputation: An optimization approach, J. Mach. Learn. Res. 18 (2018), pp. 1–39.
-
- Bill M. and Hulliger B., Treatment of multivariate outliers in incomplete business survey data, Austrian J. Stat. 45 (2016), pp. 3–23.
-
- Campbell N.A., Bushfire maping using NOAA AVHRR data, Technical Report, CSIRO, 1989.
LinkOut - more resources
Full Text Sources