Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun;40(1):29-46.
Epub 2014 Jun 27.

A nonparametric method to generate synthetic populations to adjust for complex sampling design features

Affiliations

A nonparametric method to generate synthetic populations to adjust for complex sampling design features

Qi Dong et al. Surv Methodol. 2014 Jun.

Abstract

Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

Keywords: Bayesian bootstrap; Inverse sampling; Posterior predictive distribution; Synthetic populations.

PubMed Disclaimer

Figures

Figure 7.1
Figure 7.1
Scatter plot of the descriptive and analytic statistics from the actual and synthetic populations

References

    1. Agresti A. Categorical Data Analysis. New York: John Wiley & Sons, Inc; 2002.
    1. Chen Q, Elliott MR, Little RJA. Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling. Survey Methodology. 2010;36(1):23–34. - PMC - PubMed
    1. Cohen MP. Proceedings of the Survey Research Methods Section. American Statistical Association; 1997. The Bayesian bootstrap and multiple imputation for unequal probability sample designs; pp. 635–638.
    1. de Waal AG, Willenborg LCRJ. Statistical disclosure control and sampling weights. Journal of Official Statistics. 1997;13:417–434.
    1. Dong Q. Unpublished Thesis. 2012. Combining Information from Multiple Complex Surveys. - PMC - PubMed

LinkOut - more resources