Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 22:17:101032.
doi: 10.1016/j.ssmph.2022.101032. eCollection 2022 Mar.

Quantitative methods for descriptive intersectional analysis with binary health outcomes

Affiliations

Quantitative methods for descriptive intersectional analysis with binary health outcomes

Mayuri Mahendran et al. SSM Popul Health. .

Abstract

Intersectionality recognizes that in the context of sociohistorically shaped structural power relations, an individual's multiple social positions or identities (e.g., gender, ethnicity) can interact to affect health-related outcomes. Despite limited methodological guidance, intersectionality frameworks have increasingly been incorporated into epidemiological studies, both to describe health disparities and to examine their causes. This study aimed to advance methods in intersectional estimation of binary outcomes in descriptive health disparities research through evaluation of 7 potentially intersectional data analysis methods: cross-classification, regression with interactions, multilevel analysis of individual heterogeneity (MAIHDA), and decision trees (CART, CTree, CHAID, random forest). Accuracy of estimated intersection-specific outcome prevalence was evaluated across 192 intersections using simulated data scenarios. For comparison we included a non-intersectional main effects regression. We additionally assessed variable selection performance amongst decision trees. Example analyses using National Health and Nutrition Examination Study data illustrated differences in results between methods. At larger sample sizes, all methods except for CART performed better than non-intersectional main effects regression. In smaller samples, MAIHDA was the most accurate method but showed no advantage over main effects regression, while random forest, cross-classification, and saturated regression were the least accurate, and CTree and CHAID performed moderately well. CART performed poorly for estimation and variable selection. Sensitivity analyses examining the bias-variance tradeoff suggest MAIHDA as the preferred unbiased method for accurate estimation of high-dimensional intersections at smaller sample sizes. Larger sample sizes are more imperative for other methods. Results support the adoption of an intersectional approach to descriptive epidemiology.

Keywords: Biostatistics; CART, classification and regression tree; CHAID, chi-square automatic interaction detector; CTree, conditional inference trees; Epidemiological studies; Health equity; Intersectionality; MAD, mean absolute deviation; MAIHDA, multilevel analysis of individual heterogeneity and discriminatory accuracy; NHANES, National Health and Nutrition Examination Study; Research design; SD, standard deviation; U.S., United States; VIM, variable importance measure.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Fig. 1
Fig. 1
A to 1.D. Boxplots of the mean absolute deviation (MAD) of intersection estimations for four different sample sizes (graph excludes outliers) 1.A. Common outcome with categorical inputs 1.B. Rare outcome with categorical inputs 1.C. Common outcome with mixed inputs 1.D. Rare outcome with mixed inputs. Abbreviations: CART = classification and regression tree; CHAID = chi-square automatic interaction detector; CTree = conditional inference trees; MAIHDA = multilevel analysis of individual heterogeneity and discriminatory accuracy.
Fig. 1
Fig. 1
A to 1.D. Boxplots of the mean absolute deviation (MAD) of intersection estimations for four different sample sizes (graph excludes outliers) 1.A. Common outcome with categorical inputs 1.B. Rare outcome with categorical inputs 1.C. Common outcome with mixed inputs 1.D. Rare outcome with mixed inputs. Abbreviations: CART = classification and regression tree; CHAID = chi-square automatic interaction detector; CTree = conditional inference trees; MAIHDA = multilevel analysis of individual heterogeneity and discriminatory accuracy.
Fig. 2
Fig. 2
A to 2.C. Prevalence of high blood pressure by intersection. Abbreviations: CART = classification and regression tree; CHAID = chi-square automatic interaction detector; CTree = conditional inference trees; MAIHDA = multilevel analysis of individual heterogeneity and discriminatory accuracy.
Fig. 2
Fig. 2
A to 2.C. Prevalence of high blood pressure by intersection. Abbreviations: CART = classification and regression tree; CHAID = chi-square automatic interaction detector; CTree = conditional inference trees; MAIHDA = multilevel analysis of individual heterogeneity and discriminatory accuracy.
Figure 3
Figure 3
A to 3.D. Boxplots of the MAD of intersection-specific estimations for two different small sample sizes, and a simulated outcome prevalence of 50% (graph excludes outliers) A. Categorical inputs B. categorical inputs with larger effect sizes only for interaction effects C. Mixed inputs D. Mixed inputs with larger effect sizes only for the interaction effects. Abbreviations: CART = classification and regression tree; CHAID = chi-square automatic interaction detector; CTree = conditional inference trees; MAIHDA = multilevel analysis of individual heterogeneity and discriminatory accuracy.
Figure 3
Figure 3
A to 3.D. Boxplots of the MAD of intersection-specific estimations for two different small sample sizes, and a simulated outcome prevalence of 50% (graph excludes outliers) A. Categorical inputs B. categorical inputs with larger effect sizes only for interaction effects C. Mixed inputs D. Mixed inputs with larger effect sizes only for the interaction effects. Abbreviations: CART = classification and regression tree; CHAID = chi-square automatic interaction detector; CTree = conditional inference trees; MAIHDA = multilevel analysis of individual heterogeneity and discriminatory accuracy.

References

    1. Agènor M. Future directions for incorporating intersectionality into quantitative population health research. American Journal of Public Health. 2020;110(6):803–806. doi: 10.2105/AJPH.2020.305610. - DOI - PMC - PubMed
    1. Altmann A., Toloşi L., Sander O., Lengauer T. Permutation importance: A corrected feature importance measure. Bioinformatics. 2010;26(10):1340–1347. doi: 10.1093/bioinformatics/btq134. - DOI - PubMed
    1. Authors’ Names Redacted, (n.d.), Describing intersectional health outcomes: An evaluation of data analysis methods, (In Press). - PMC - PubMed
    1. Banerjee M., Reynolds E., Andersson H.B., Nallamothu B.K. Tree-based analysis: A practical approach to create clinical decision-making tools. Circulation: Cardiovascular Quality and Outcomes. 2019;12(5):e004879. doi: 10.1161/CIRCOUTCOMES.118.004879. - DOI - PMC - PubMed
    1. Bates D., Mächler M., Bolker B., Walker S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2015;67(1):1–48. doi: 10.18637/JSS.V067·I01. - DOI

LinkOut - more resources