Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May;36(5):1181-1188.
doi: 10.1007/s11606-020-06438-1. Epub 2021 Feb 23.

Predicting Self-Rated Health Across the Life Course: Health Equity Insights from Machine Learning Models

Affiliations

Predicting Self-Rated Health Across the Life Course: Health Equity Insights from Machine Learning Models

Cheryl R Clark et al. J Gen Intern Med. 2021 May.

Abstract

Background: Self-rated health is a strong predictor of mortality and morbidity. Machine learning techniques may provide insights into which of the multifaceted contributors to self-rated health are key drivers in diverse groups.

Objective: We used machine learning algorithms to predict self-rated health in diverse groups in the Behavioral Risk Factor Surveillance System (BRFSS), to understand how machine learning algorithms might be used explicitly to examine drivers of self-rated health in diverse populations.

Design: We applied three common machine learning algorithms to predict self-rated health in the 2017 BRFSS survey, stratified by age, race/ethnicity, and sex. We replicated our process in the 2016 BRFSS survey.

Participants: We analyzed data from 449,492 adult participants of the 2017 BRFSS survey.

Main measures: We examined area under the curve (AUC) statistics to examine model fit within each group. We used traditional logistic regression to predict self-rated health associated with features identified by machine learning models.

Key results: Each algorithm, regularized logistic regression (AUC: 0.81), random forest (AUC: 0.80), and support vector machine (AUC: 0.81), provided good model fit in the BRFSS. Predictors of self-rated health were similar by sex and race/ethnicity but differed by age. Socioeconomic features were prominent predictors of self-rated health in mid-life age groups. Income [OR: 1.70 (95% CI: 1.62-1.80)], education [OR: 2.02 (95% CI: 1.89, 2.16)], physical activity [OR: 1.52 (95% CI: 1.46-1.58)], depression [OR: 0.66 (95% CI: 0.63-0.68)], difficulty concentrating [OR: 0.62 (95% CI: 0.58-0.66)], and hypertension [OR: 0.59 (95% CI: 0.57-0.61)] all predicted the odds of excellent or very good self-rated health.

Conclusions: Our analysis of BRFSS data show social determinants of health are prominent predictors of self-rated health in mid-life. Our work may demonstrate promising practices for using machine learning to advance health equity.

Keywords: healthcare disparities; machine learning; self-rated health; social determinants of health; socioeconomic factors.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they do not have a conflict of interest.

Figures

Figure 1
Figure 1
Top variables of importance across age groups, 2017 BRFSS. Notes: Seven domains include demographics: age, sex, race, geographic division, state of residence, number of adults in the respondent’s household, marriage status, veteran status, number of children, and language spoken; clinical conditions: a self-reported history of cancer, asthma, depression, diabetes, stroke, cardiovascular disease, kidney disease, arthritis, COPD, skin cancer, body mass index, angina, or hypertension; functional status: difficulty doing errands, difficulty dressing, difficulty walking, difficulty communicating, blindness, or deafness; access to clinical care: delayed care due to cost, having a primary care physician, insurance status, and having had doctor visit in the previous year; health behavior: alcohol use, smoking status, e-cigarette use, use of chewing tobacco, exercise practices, drunk driving, seat belt use, Internet use in last 30 days, daily fruit consumption, and daily vegetable consumption; preventive care: having had an HIV test, having identified HIV risk factors, and having had a flu vaccine; socioeconomic status: education attainment, income category, homeownership, employment, and cell phone use.
Figure 2
Figure 2
Top variables of importance by domain and age group, 2017 BRFSS. Notes: Data analysis performed with regularized logistic regression machine learning algorithm.

Comment in

References

    1. DeSalvo KB, Jones TM, Peabody J, et al. Health care expenditure prediction with a single item, self-rated health measure. Med Care. 2009;47(4):440–7. doi: 10.1097/MLR.0b013e318190b716. - DOI - PubMed
    1. Boscardin CK, Gonzales R, Bradley KL, et al. Predicting cost of care using self-reported health status data. BMC Health Serv Res. 2015;15(1):406. doi: 10.1186/s12913-015-1063-1. - DOI - PMC - PubMed
    1. Balkrishnan R, Anderson RT, Bowton D. Self-reported health status predictors of healthcare services utilization and charges in elderly asthmatic patients. J Asthma. 2000;37(5):415–23. doi: 10.3109/02770900009055467. - DOI - PubMed
    1. Zimmerman FJ, Anderson NW. Trends in Health Equity in the United States by Race/Ethnicity, Sex, and Income, 1993-2017. JAMA Netw Open. 2019;2(6):e196386. doi: 10.1001/jamanetworkopen.2019.6386. - DOI - PMC - PubMed
    1. Johnston KJ, Joynt Maddox KE. The role of social, cognitive, and functional risk factors in Medicare spending for dual and nondual enrollees. Health Aff (Millwood) 2019;38(4):569–76. doi: 10.1377/hlthaff.2018.05032. - DOI - PubMed

Publication types

LinkOut - more resources