Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 19;12(5):e54.
doi: 10.2196/jmir.1448.

Missing data approaches in eHealth research: simulation study and a tutorial for nonmathematically inclined researchers

Affiliations

Missing data approaches in eHealth research: simulation study and a tutorial for nonmathematically inclined researchers

Matthijs Blankers et al. J Med Internet Res. .

Abstract

Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings.

Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed.

Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study.

Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%).

Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.

PubMed Disclaimer

Conflict of interest statement

None declared

Figures

Figure 1
Figure 1
Strip chart for 9 missing data approaches and the reference value
Figure 2
Figure 2
Repeated application of nine missing data approaches

References

    1. Eysenbach G. The law of attrition. J Med Internet Res. 2005;7(1):e11. doi: 10.2196/jmir.7.1.e11. http://www.jmir.org/2005/1/e11/v7e11 - DOI - PMC - PubMed
    1. Christensen H, Griffiths KM, Farrer L. Adherence in internet interventions for anxiety and depression. J Med Internet Res. 2009;11(2):e13. doi: 10.2196/jmir.1194. http://www.jmir.org/2009/2/e13/v11i2e13 - DOI - PMC - PubMed
    1. Schafer JL, Olsen MK. Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. Multivariate Behav Res. 1998;33:545–71. doi: 10.1207/s15327906mbr3304_5. - DOI - PubMed
    1. Graham JW. Missing data analysis: making it work in the real world. Annu Rev Psychol. 2009;60:549–76. doi: 10.1146/annurev.psych.58.110405.085530. - DOI - PubMed
    1. Schafer JL. Multiple imputation: a primer. Stat Methods Med Res. 1999 Mar;8(1):3–15. - PubMed

Publication types

MeSH terms