Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun;36(1):23-34.
Epub 2010 Jun 29.

Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling

Affiliations

Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling

Qixuan Chen et al. Surv Methodol. 2010 Jun.

Abstract

We propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling. The advantages of the BPSP estimator over the Hájek (HK), Generalized Regression (GR), and parametric model-based prediction estimators are demonstrated by simulation studies and a real example in tax auditing. Simulation studies show that the BPSP estimator is more efficient, and its 95% credible interval provides better confidence coverage with shorter average width than the HK and GR estimators, especially when the population proportion is close to zero or one or when the sample is small. Compared to linear model-based predictive estimators, the BPSP estimators are robust to model misspecification and influential observations in the sample.

Keywords: Bayesian analysis; Binary data; Penalized spline regression; Probability proportional to size; Survey samples.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Two simulated artificial populations (N = 2,000)
Figure 2
Figure 2
A random pps sample from the EXP case (n = 200, N = 2,000): (a) scatter plot of Z; the three grey lines are the superpopulation 10th, 50th, and 90th percentiles, respectively. (b) black circles are observed units of binary survey variable Y in the sample, defined as Y = I (Z ≤ 10th percentile); the grey solid and dashed curves are posterior means of Pr (Yi = 1|πi) and 95% credible intervals, respectively, simulated based on a probit p-spline model on π; and the black curve is the superpopulation Pr(Yi = 1|πi). (c) similar to (b), but with Y = I (Z ≤ 50th percentile). (d) similar to (b), but with Y = I (Z ≤ 90th percentile)
Figure 3
Figure 3
Box plots of the probabilities of inclusion for two sample sizes in the tax auditing example
Figure 4
Figure 4
Predictions based on pps samples only in the tax auditing example, X-axis: inclusion probabilities π, Y-axis: P(Y = 1|π); black dots are the true P(Y = 1|π) within each percentile of π; grey curves are ten realizations of the posterior means of P(Y = 1|π). The prediction models are (a) probit linear p-spline regression, (b) linear probit regression, (c) quadratic probit regression
Figure 5
Figure 5
Predictions based on the combined data of pps samples and the observations sampled with certainty in the tax auditing example, X-axis: inclusion probabilities π, Y-axis: P(Y = 1|π); black dots are the true P(Y = 1|π) within each percentile of π; grey curves are ten realizations of the posterior mean of P(Y = 1|π). The prediction models are (a) probit linear p-spline regression, (b) linear probit regression, (c) quadratic probit regression

References

    1. Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of American Statistical Association. 1993;88:669–679.
    1. Basu D. An essay on the logical foundations of survey sampling.Part 1. In: Godambe VP, Sprott DA, editors. Foundations of Statistical Inference. Toronto: Holt, Rinehart and Winston; 1971. pp. 203–242.
    1. Compumine. Re: analysis – Tax audit data mining. 2007 Feb; 2007. http://www.compumine.com/web/public/newsletter/20071/tax-audit-data-mining.
    1. Crainiceanu CM, Ruppert D, Wand M. Bayesian analysis for penalized spline regression using WinBUGS. Journal of Statistical Software. 2005;14:2005. 14.
    1. Duchesne P. Estimation of a proportion with survey data. Journal of Statistics Education. 2003;11:3.

LinkOut - more resources