Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 30;37(14):2252-2266.
doi: 10.1002/sim.7654. Epub 2018 Apr 16.

Bootstrap inference when using multiple imputation

Affiliations

Bootstrap inference when using multiple imputation

Michael Schomaker et al. Stat Med. .

Abstract

Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available.

Keywords: HIV; causal inference; g-methods; missing data; resampling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Coverage probability of the interval estimates for β1 in the first simulation setting dependent on the number of imputations. Results related to the complete simulated data, i.e. before missing data are generated, are labelled “original data”.
Figure 2
Figure 2
Estimate of β1 in the first simulation setting, for a random simulation run: distribution of ‘MI Boot (pooled)’ for each imputed dataset (top) and distribution of ‘Boot MI (PS)’ for 50 random bootstrap samples (PS). Point estimates are marked by the black tick marks on the x-axis.
Figure 3
Figure 3
Estimated cumulative mortality difference between the interventions ‘immediate ART’ and ‘350/15’ at 3 years: distributions and confidence intervals of different estimators
Figure 4
Figure 4
Estimated cumulative mortality difference: distribution of ‘MI Boot (PS)’ for each imputed dataset (top) and distribution of ‘Boot MI (PS)’ for 25 random bootstrap samples (bottom). Point estimates are marked by the black tick marks on the x-axis.

References

    1. Rubin DB. Multiple imputation after 18+ years. Journal of the American Statistical Association. 1996;91(434):473–489.
    1. Horton NJ, Kleinman KP. Much ado about nothing: a comparison of missing data methods and software to fit incomplete regression models. The American Statistician. 2007;61:79–90. - PMC - PubMed
    1. Honaker J, King G, Blackwell M. Amelia II: A program for missing data. Journal of Statistical Software. 2011;45(7):1–47.
    1. van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. Journal of Statistical Software. 2011;45(3):1–67.
    1. Royston P, White IR. Multiple imputation by chained equations (mice): Implementation in Stata. Journal of Statistical Software. 2011;45(4):1–20.

Publication types

Substances