Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar;32(1):231-256.
doi: 10.1515/JOS-2016-0011. Epub 2016 Mar 10.

Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

Affiliations

Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

Hanzhi Zhou et al. J Off Stat. 2016 Mar.

Abstract

Multiple imputation (MI) is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs) nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a general-purpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI) data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES) III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.

Keywords: Finite population Bayesian bootstrap; Haldane prior; clustered sample; sample weights; stratified sample.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Correlation between variables in the simulated population (darker shades = higher correlation)
Fig. 2
Fig. 2
Distribution of weights under the two subsampling schemes
Fig. 3
Fig. 3
Comparison of methods for quantile estimation of BMI, by gender

References

    1. Anderson D, Aitkin M. Variance Component Models With Binary Response: Interviewer Variability. Journal of the Royal Statistical Society, Series B: Statistical Methodology. 1985;47:203–210.
    1. Cohen MP. Proceedings of the Section on Survey Research Methods. American Statistical Association (ASA); Anaheim, CA: 1997. The Bayesian Bootstrap and Multiple Imputation for Unequal Probability Sample Designs; pp. 635–638. 1997.
    1. Dong Q, Elliott MR, Raghunathan TE. A Nonparametric Method to Generate Synthetic Populations to Adjust for Complex Sample Design. Survey Methodology. 2014;40:29–46. - PMC - PubMed
    1. Efron B. Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics. 1979;7:1–26.
    1. Francisco CA, Fuller WA. Quantile Estimation With a Complex Survey Design. Annals of Statististics. 1991;19:454–469.

Publication types

LinkOut - more resources