Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 11;13(1):73-93.

Tree-based Machine Learning Methods for Survey Research

Affiliations

Tree-based Machine Learning Methods for Survey Research

Christoph Kern et al. Surv Res Methods. .

Abstract

Predictive modeling methods from the field of machine learning have become a popular tool across various disciplines for exploring and analyzing diverse data. These methods often do not require specific prior knowledge about the functional form of the relationship under study and are able to adapt to complex non-linear and non-additive interrelations between the outcome and its predictors while focusing specifically on prediction performance. This modeling perspective is beginning to be adopted by survey researchers in order to adjust or improve various aspects of data collection and/or survey management. To facilitate this strand of research, this paper (1) provides an introduction to prominent tree-based machine learning methods, (2) reviews and discusses previous and (potential) prospective applications of tree-based supervised learning in survey research, and (3) exemplifies the usage of these techniques in the context of modeling and predicting nonresponse in panel surveys.

Keywords: adaptive design; machine learning; nonresponse; panel attrition; predictive models.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Coefficient Plots of Terminal Node Models of MOB Tree (y = Refusal in GSOEP Wave 2014)
Figure 2:
Figure 2:
Conditional Inference Tree (y = Refusal in GSOEP Wave 2014)
Figure 3:
Figure 3:
Performance Curves in Test Set (y = Refusal in GSOEP Wave 2014)
Figure 4:
Figure 4:
Top-20 Variable Importance (y = Refusal in GSOEP Wave 2014)
Figure 5:
Figure 5:
Partial Dependence Plots Based on Random Forest Result (y = Refusal in GSOEP Wave 2014)

References

    1. Arunachalam H, Atkin G, Wettlaufer D, Eck A, Soh LK, and Belli R (2015). I know what you did next: Predicting respondents next activity using machine learning. Paper presented at the 70th Annual Conference of the American Association for Public Opinion Research, Hollywood, FL.
    1. Berk RA (2006). An introduction to ensemble methods for data analysis. Sociological Methods & Research, 34(3):263–295.
    1. Bethlehem J, Cobben F, and Schouten B (2011). Handbook of nonresponse in household surveys, volume 568. John Wiley & Sons.
    1. Borgoni R and Berrington A (2013). Evaluating a sequential tree-based procedure for multivariate imputation of complex missing data structures. Quality & Quantity, 47(4):1991–2008.
    1. Breidt FJ and Opsomer JD (2017). Model-assisted survey estimation with modern prediction techniques. Statistical Science, 32(2):190–205.

LinkOut - more resources