Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 14;2(3):qxae017.
doi: 10.1093/haschl/qxae017. eCollection 2024 Mar.

American clusters: using machine learning to understand health and health care disparities in the United States

Affiliations

American clusters: using machine learning to understand health and health care disparities in the United States

Diana M Bowser et al. Health Aff Sch. .

Erratum in

Abstract

Health and health care access in the United States are plagued by high inequality. While machine learning (ML) is increasingly used in clinical settings to inform health care delivery decisions and predict health care utilization, using ML as a research tool to understand health care disparities in the United States and how these are connected to health outcomes, access to health care, and health system organization is less common. We utilized over 650 variables from 24 different databases aggregated by the Agency for Healthcare Research and Quality in their Social Determinants of Health (SDOH) database. We used k-means-a non-hierarchical ML clustering method-to cluster county-level data. Principal factor analysis created county-level index values for each SDOH domain and 2 health care domains: health care infrastructure and health care access. Logistic regression classification was used to identify the primary drivers of cluster classification. The most efficient cluster classification consists of 3 distinct clusters in the United States; the cluster having the highest life expectancy comprised only 10% of counties. The most efficient ML clusters do not identify the clusters with the widest health care disparities. ML clustering, using county-level data, shows that health care infrastructure and access are the primary drivers of cluster composition.

Keywords: American clusters; health disparities; machine learning; social determinants of health.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest Please see ICMJE form(s) for author conflicts of interest. These have been provided as Supplementary materials.

Figures

Figure 1.
Figure 1.
Life expectancy, in years. Panel A shows the average life expectancy, 2020. Panel B shows the average life expectancy over time from 2015 to 2020, by year and cluster. Life expectancy is the mean value for all of the counties included in the dataset for all years (n = 1673, cluster 1; n = 310, cluster 2; n = 1161, cluster 3).
Figure 2.
Figure 2.
Map of US counties for the machine learning 3-clusters analysis. Note that the clusters, although showing some tendencies for geographic concentration, are not defined by geography. For example, some counties in the upper peninsula of Michigan and the panhandle of Texas are both assigned to cluster 1 based on the underlying patterns of the 650 county variables, not geography. Maps are created using data for all counties in the dataset (n = 1673, cluster 1; n = 310, cluster 2; n = 1161, cluster 3).

Similar articles

Cited by

References

    1. Tolbert J, Drake P, Damico A. Key facts about the uninsured population. KFF. Published 2022. Accessed July 24, 2023. https://www.kff.org/uninsured/issue-brief/key-facts-about-the-uninsured-...
    1. Dwyer-Lindgren L, Mokdad AH, Srebotnjak T, Flaxman AD, Hansen GM, Murray CJL. Cigarette smoking prevalence in US counties: 1996-2012. Popul Health Metr. 2014;12(1):5. 10.1186/1478-7954-12-5 - DOI - PMC - PubMed
    1. Kulkarni SC, Levin-Rector A, Ezzati M, Murray CJL. Falling behind: life expectancy in US counties from 2000 to 2007 in an international context. Popul Health Metr. 2011;9(1):16. 10.1186/1478-7954-9-16 - DOI - PMC - PubMed
    1. Wang H, Schumacher AE, Levitz CE, Mokdad AH, Murray CJL. Left behind: widening disparities for males and females in US county life expectancy, 1985–2010. Popul Health Metr. 2013;11(1):8. 10.1186/1478-7954-11-8 - DOI - PMC - PubMed
    1. Andrasfay T, Goldman N. Reductions in 2020 US life expectancy due to COVID-19 and the disproportionate impact on the Black and Latino populations. Proc Natl Acad Sci U S A. 2021;118(5):e2014746118. 10.1073/pnas.2014746118 - DOI - PMC - PubMed

LinkOut - more resources