Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan;4(1):9.
doi: 10.3978/j.issn.2305-5839.2015.12.38.

Missing data imputation: focusing on single imputation

Affiliations

Missing data imputation: focusing on single imputation

Zhongheng Zhang. Ann Transl Med. 2016 Jan.

Abstract

Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.

Keywords: Big-data clinical trial; R; longitudinal data; missing data; single imputation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The author has no conflicts of interest to declare.

Figures

None
Zhongheng Zhang, MMed.
Figure 1
Figure 1
Scatter plot of lac vs. map and missing values on lac is denoted by red triangle.
Figure 2
Figure 2
Scatter plot of lac vs. map with missing values on lac replaced by the mean value of observed lac.
Figure 3
Figure 3
Scatter plot of lac vs. map with missing values on lac replaced by values predicted by fitted regression model.
Figure 4
Figure 4
Missing values are predicted by linear regression. Note that residual variance is added to reflect uncertainty in estimation.
Figure 5
Figure 5
Longitudinal imputations with different methods.

References

    1. Wood AM, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin Trials 2004;1:368-76. - PubMed
    1. Bell ML, Fiero M, Horton NJ, et al. Handling missing data in RCTs; a review of the top medical journals. BMC Med Res Methodol 2014;14:118. - PMC - PubMed
    1. Demissie S, LaValley MP, Horton NJ, et al. Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Stat Med 2003;22:545-57. - PubMed
    1. Knol MJ, Janssen KJ, Donders AR, et al. Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J Clin Epidemiol 2010;63:728-36. - PubMed
    1. Masconi KL, Matsha TE, Erasmus RT, et al. Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa. PLoS One 2015;10:e0139210. - PMC - PubMed