Missing data imputation: focusing on single imputation
- PMID: 26855945
- PMCID: PMC4716933
- DOI: 10.3978/j.issn.2305-5839.2015.12.38
Missing data imputation: focusing on single imputation
Abstract
Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.
Keywords: Big-data clinical trial; R; longitudinal data; missing data; single imputation.
Conflict of interest statement
Figures
References
-
- Wood AM, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin Trials 2004;1:368-76. - PubMed
-
- Demissie S, LaValley MP, Horton NJ, et al. Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Stat Med 2003;22:545-57. - PubMed
-
- Knol MJ, Janssen KJ, Donders AR, et al. Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J Clin Epidemiol 2010;63:728-36. - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous