Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May;40(2):272-288.
doi: 10.1214/24-sts932. Epub 2025 Jun 2.

On the Use of Auxiliary Variables in Multilevel Regression and Poststratification

Affiliations

On the Use of Auxiliary Variables in Multilevel Regression and Poststratification

Yajuan Si. Stat Sci. 2025 May.

Abstract

Multilevel regression and poststratification (MRP) is a popular method for addressing selection bias in subgroup estimation, with broad applications across fields from social sciences to public health. In this paper, we examine the inferential validity of MRP in finite populations, exploring the impact of poststratification and model specification. The success of MRP relies heavily on the availability of auxiliary information that is strongly related to the outcome. To enhance the fitting performance of the outcome model, we recommend modeling the inclusion probabilities conditionally on auxiliary variables and incorporating flexible functions of estimated inclusion probabilities as predictors in the mean structure. We present a statistical data integration framework that offers robust inferences for probability and nonprobability surveys, addressing various challenges in practical applications. Our simulation studies indicate the statistical validity of MRP, which involves a tradeoff between bias and variance, with greater benefits for subgroup estimates with small sample sizes, compared to alternative methods. We have applied our methods to the Adolescent Brain Cognitive Development (ABCD) Study, which collected information on children across 21 geographic locations in the U.S. to provide national representation, but is subject to selection bias as a nonprobability sample. We focus on the cognition measure of diverse groups of children in the ABCD study and show that the use of auxiliary variables affects the findings on cognitive performance.

Keywords: data integration; model-based; nonprobability sample; robust inference; selection/nonresponse bias.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Outputs of overall and subgroup mean estimates when the outcome models and adjustment factors of all methods are correctly specified. RMSE: root mean squared error, DR: doubly robust estimator, IPW: inverse propensity weighting estimator, GREG: generalized regression estimator, MRP: multilevel regression and poststratification, MRP-INT: an integrated MRP by adding estimated inclusion probabilities as predictors in the outcome models and predicting all population cells with known population cell sizes, MRP-P: MRP that uses all population cells with known population cell sizes, and MRP-R: MRP that uses available cells in the reference probability sample with unknown population cell sizes.
Figure 2:
Figure 2:
Outputs of overall and subgroup mean estimates when the outcome models and adjustment factors of all methods are incorrectly specified. RMSE: root mean squared error, DR: doubly robust estimator, IPW: inverse propensity weighting estimator, GREG: generalized regression estimator, MRP: multilevel regression and poststratification, MRP-INT: an integrated MRP by adding estimated inclusion probabilities as predictors in the outcome models and predicting all population cells with known population cell sizes, MRP-P: MRP that uses all population cells with known population cell sizes, and MRP-R: MRP that uses available cells in the reference probability sample with unknown population cell sizes.
Figure 3:
Figure 3:
Finite population inferences of average cognition test scores by groups, with seven auxiliary variables. The error bars are 95% confidence intervals. HH: household, IPW: inverse propensity weighting estimator, MRP: multilevel regression and poststratification, MRP-INT: an integrated MRP by adding estimated inclusion probabilities as predictors in the outcome models and predicting all population cells with known population cell sizes, MRP-P: MRP that uses all population cells with known population cell sizes, and MRP-R: MRP that uses available cells in the reference probability sample with unknown population cell sizes.
Figure 4:
Figure 4:
Finite population inferences of average cognition test scores by groups. The outcome model has five auxiliary variables. The error bars are 95% confidence intervals. HH: household, IPW: inverse propensity weighting estimator, MRP: multilevel regression and poststratification, MRP-INT: an integrated MRP by adding estimated inclusion probabilities as predictors in the outcome models and predicting all population cells with known population cell sizes, MRP-P: MRP that uses all population cells with known population cell sizes, and MRP-R: MRP that uses available cells in the reference probability sample with unknown population cell sizes.

Similar articles

References

    1. Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Gile KJ, and Tourangeau R (2013). Summary report of the AAPOR Task Force on non-probability sampling. Journal of Survey Statistics and Methodology 1(2), 90–143.
    1. Bang H and Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–972. - PubMed
    1. Bethlehem JG (2002). Weighting nonresponse adjustments based on auxiliary information. In Groves RM, Dillman DA, Eltinge JL, and Little RJA (Eds.), Survey Nonresponse. Wiley.
    1. Bradley R and Corwyn R (2002). Socioeconomic status and child development. Annu Rev Psychol 53, 371–399. - PubMed
    1. Breidt F and Opsomer J (2017). Model-assisted survey estimation with modern prediction techniques. Statistical Science 32, 190–205.

LinkOut - more resources