Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr;29(1):e100456.
doi: 10.1136/bmjhci-2021-100456.

Resampling to address inequities in predictive modeling of suicide deaths

Affiliations

Resampling to address inequities in predictive modeling of suicide deaths

Majerle Reeves et al. BMJ Health Care Inform. 2022 Apr.

Abstract

Objective: Improve methodology for equitable suicide death prediction when using sensitive predictors, such as race/ethnicity, for machine learning and statistical methods.

Methods: Train predictive models, logistic regression, naive Bayes, gradient boosting (XGBoost) and random forests, using three resampling techniques (Blind, Separate, Equity) on emergency department (ED) administrative patient records. The Blind method resamples without considering racial/ethnic group. Comparatively, the Separate method trains disjoint models for each group and the Equity method builds a training set that is balanced both by racial/ethnic group and by class.

Results: Using the Blind method, performance range of the models' sensitivity for predicting suicide death between racial/ethnic groups (a measure of prediction inequity) was 0.47 for logistic regression, 0.37 for naive Bayes, 0.56 for XGBoost and 0.58 for random forest. By building separate models for different racial/ethnic groups or using the equity method on the training set, we decreased the range in performance to 0.16, 0.13, 0.19, 0.20 with Separate method, and 0.14, 0.12, 0.24, 0.13 for Equity method, respectively. XGBoost had the highest overall area under the curve (AUC), ranging from 0.69 to 0.79.

Discussion: We increased performance equity between different racial/ethnic groups and show that imbalanced training sets lead to models with poor predictive equity. These methods have comparable AUC scores to other work in the field, using only single ED administrative record data.

Conclusion: We propose two methods to improve equity of suicide death prediction among different racial/ethnic groups. These methods may be applied to other sensitive characteristics to improve equity in machine learning with healthcare applications.

Keywords: Data Science; Decision Trees; Machine Learning.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Similar articles

Cited by

References

    1. National Institute of Mental Health. Suicide, 2021. Available: https://www.nimh.nih.gov/health/statistics/suicide.shtml [Accessed 13 Jul 2021].
    1. Franklin JC, Ribeiro JD, Fox KR, et al. . Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull 2017;143:187–232. 10.1037/bul0000084 - DOI - PubMed
    1. Bhat H, Goldman-Mellor S. Predicting adolescent suicide attempts with neural networks. NIPS 2017 Workshop on Machine Learning for Health (ML4H), 2017. Available: 10057.http://arxiv.org/abs/1711.10057
    1. Gradus JL, Rosellini AJ, Horváth-Puhó E, et al. . Prediction of sex-specific suicide risk using machine learning and single-payer health care registry data from Denmark. JAMA Psychiatry 2020;77:25–34. 10.1001/jamapsychiatry.2019.2905 - DOI - PMC - PubMed
    1. Katz C, Randall JR, Sareen J, et al. . Predicting suicide with the SAD PERSONS scale. Depress Anxiety 2017;34:809–16. 10.1002/da.22632 - DOI - PubMed