Multiple imputation using an iterative hot-deck with distance-based donor selection
- PMID: 17634973
- DOI: 10.1002/sim.3001
Multiple imputation using an iterative hot-deck with distance-based donor selection
Abstract
Hot-deck imputation offers advantages in reflecting salient features of data distributions in missing-data problems, but previous implementations have lacked the appeal associated with modern Bayesian statistical-computing techniques. We outline a strategy of iterative hot-deck multiple imputation with distance-based donor selection. With distance defined as a monotonic function of the difference in predictive means between cases, donors are chosen with probability inversely proportional to their distance from the donee. This method retains the implementation ease of ad hoc techniques, while incorporating the desirable features of Bayesian approaches. Special cases of our method include nearest-neighbor imputation and a simple random hot-deck. Iterating the procedure provides an analogy to Markov Chain Monte Carlo methods and is intended to mitigate dependence on starting values. Results from imputing missing values in a longitudinal depression treatment trial as well as a simulation study are presented. We evaluate how different definitions of distance, choices of starting values, the order in which variables are chosen for imputation, and the number of iterations impact inferences. We show that our measure of distance controls the tradeoff between bias and variance of our estimates. We find that inferences from the depression treatment trial are not sensitive to most definitions of distance. In addition, while differences exist between 1 iteration and 10 iterations, there are no meaningful differences between inferences based on 10 iterations and those based on 500 iterations. The choice of starting value did not have an impact on inferences but the order in which the variables were chosen for imputation was significant even after iteration.
Copyright (c) 2007 John Wiley & Sons, Ltd.
Similar articles
-
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4. BMC Med Res Methodol. 2024. PMID: 39538117 Free PMC article.
-
The relationship between hot-deck multiple imputation and weighted likelihood.Stat Med. 1997 Jan 15-Feb 15;16(1-3):5-19. doi: 10.1002/(sici)1097-0258(19970115)16:1<5::aid-sim469>3.0.co;2-8. Stat Med. 1997. PMID: 9004380
-
Bayesian Extended Redundancy Analysis: A Bayesian Approach to Component-based Regression with Dimension Reduction.Multivariate Behav Res. 2020 Jan-Feb;55(1):30-48. doi: 10.1080/00273171.2019.1598837. Epub 2019 Apr 25. Multivariate Behav Res. 2020. PMID: 31021267
-
Imputation strategies for missing continuous outcomes in cluster randomized trials.Biom J. 2008 Jun;50(3):329-45. doi: 10.1002/bimj.200710423. Biom J. 2008. PMID: 18537126 Review.
-
Multiple imputation: a primer.Stat Methods Med Res. 1999 Mar;8(1):3-15. doi: 10.1177/096228029900800102. Stat Methods Med Res. 1999. PMID: 10347857 Review.
Cited by
-
A multiple imputation-based sensitivity analysis approach for regression analysis with a missing not at random covariate.Stat Med. 2023 Jun 30;42(14):2275-2292. doi: 10.1002/sim.9723. Epub 2023 Mar 30. Stat Med. 2023. PMID: 36997162 Free PMC article.
-
A Review of Hot Deck Imputation for Survey Non-response.Int Stat Rev. 2010 Apr;78(1):40-64. doi: 10.1111/j.1751-5823.2010.00103.x. Int Stat Rev. 2010. PMID: 21743766 Free PMC article.
-
Multiple imputation by predictive mean matching in cluster-randomized trials.BMC Med Res Methodol. 2020 Mar 30;20(1):72. doi: 10.1186/s12874-020-00948-6. BMC Med Res Methodol. 2020. PMID: 32228491 Free PMC article.
-
Missing value imputation in longitudinal measures of alcohol consumption.Int J Methods Psychiatr Res. 2011 Mar;20(1):50-61. doi: 10.1002/mpr.330. Int J Methods Psychiatr Res. 2011. PMID: 21556290 Free PMC article.
-
Self-Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study.JMIR Public Health Surveill. 2021 Oct 13;7(10):e30824. doi: 10.2196/30824. JMIR Public Health Surveill. 2021. PMID: 34643539 Free PMC article.