Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Mar 3:117:110062.
doi: 10.1016/j.annepidem.2026.110062. Online ahead of print.

Population-based clustering of co-occurring social determinants: An application of unsupervised machine learning

Affiliations
Free article

Population-based clustering of co-occurring social determinants: An application of unsupervised machine learning

Ingrid Giesinger et al. Ann Epidemiol. .
Free article

Abstract

Purpose: This study aimed to develop a cluster-based measure of multiple co-occurring social determinants of health by applying unsupervised machine learning to a population-based cohort, offering a data-driven approach to organize complex social exposures.

Methods: Unsupervised clustering was applied to a population-based cohort of Ontario respondents to six-cycles of the Canadian Community Health Survey (2001-2012) linked to the Canadian census and vital statistics data. Clusters were evaluated using internal metrics, visualization techniques, descriptive analysis and theoretical considerations to determine the optimal number of clusters. Sensitivity analyses were integrated across the iterative clustering process. Premature mortality rates were generated assess validity.

Results: Optimal clustering solutions included 4-clusters and 6-clusters. Both cluster solutions revealed distinct social typologies. The 6-cluster solution offered greater granularity and theoretical interpretability. The 4-cluster solution showed greater heterogeneity within certain marginalized groups. Premature mortality rates differed meaningfully across clusters, supporting the clustering approach in capturing risk associated with social exposure.

Conclusions: Unsupervised machine learning methods identified meaningful population subgroups reflecting complex patterns of social exposures. This approach offers a flexible, data-driven method for characterizing social exposures that can be considered alongside theoretical frameworks and used for equity monitoring, intervention planning and policy development.

Keywords: Artificial intelligence; Epidemiology; Population health; Social determinants of health; Unsupervised machine learning.

PubMed Disclaimer

LinkOut - more resources