Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 17;8(1):287.
doi: 10.1038/s41746-025-01645-8.

Social determinants of health extraction from clinical notes across institutions using large language models

Affiliations

Social determinants of health extraction from clinical notes across institutions using large language models

Vipina K Keloth et al. NPJ Digit Med. .

Abstract

Detailed social determinants of health (SDoH) is often buried within clinical text in EHRs. Most current NLP efforts for SDoH have limitations, investigating limited factors, deriving data from a single institution, using specific patient cohorts/note types, with reduced focus on generalizability. We aim to address these issues by creating cross-institutional corpora and developing and evaluating the generalizability of classification models, including large language models (LLMs), for detecting SDoH factors using data from four institutions. Clinical notes were annotated with 21 SDoH factors at two levels: level 1 (SDoH factors only) and level 2 (SDoH factors and associated values). Compared to other models, instruction tuned LLM achieved top performance with micro-averaged F1 over 0.9 on level 1 corpora and over 0.84 on level 2 corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. Access to trained models will be made available at https://github.com/BIDS-Xu-Lab/LLMs4SDoH .

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Distribution of SDoH factors.
Number of sentences documenting each SDoH factor for all four corpora annotated at level 1 (identifying SDoH factors only).
Fig. 2
Fig. 2. Heatmap of cross-dataset performance evaluation (level 1).
The diagonal shows micro-averaged F1scores when trained and tested on the same dataset for level 1 annotations. Other cells show F1 scores when trained on one dataset and tested on another dataset.
Fig. 3
Fig. 3. Heatmap of cross-dataset performance evaluation (level 2).
The diagonal shows micro-averaged F1scores when trained and tested on the same dataset for level 2 annotations. Other cells show F1 scores when trained on one dataset and tested on another dataset.
Fig. 4
Fig. 4. A schematic representation of the workflow.
Figure shows some of the SDoH factors, data sources, models used and evaluation process.

Update of

Similar articles

Cited by

References

    1. Galea, S., Tracy, M., Hoggatt, K. J., DiMaggio, C. & Karpati, A. Estimated deaths attributable to social factors in the United States. Am. J. Public Health101, 1456–1465 (2011). - PMC - PubMed
    1. Marmot, M. et al. Closing the gap in a generation: health equity through action on the social determinants of health. Lancet372, 1661–1669 (2008). - PubMed
    1. Singh, G. K., Siahpush, M. & Kogan, M. D. Neighborhood socioeconomic conditions, built environments, and childhood obesity. Health Aff. (Millwood)29, 503–512 (2010). - PubMed
    1. Felitti, V. J. et al. Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults: The Adverse Childhood Experiences (ACE) Study. Am. J. Prev. Med. 14, 245–258 (1998). - PubMed
    1. Healthy People 2030, https://health.gov/healthypeople/priority-areas/social-determinants-health (2023).

LinkOut - more resources