Social determinants of health extraction from clinical notes across institutions using large language models
- PMID: 40379919
- PMCID: PMC12084648
- DOI: 10.1038/s41746-025-01645-8
Social determinants of health extraction from clinical notes across institutions using large language models
Abstract
Detailed social determinants of health (SDoH) is often buried within clinical text in EHRs. Most current NLP efforts for SDoH have limitations, investigating limited factors, deriving data from a single institution, using specific patient cohorts/note types, with reduced focus on generalizability. We aim to address these issues by creating cross-institutional corpora and developing and evaluating the generalizability of classification models, including large language models (LLMs), for detecting SDoH factors using data from four institutions. Clinical notes were annotated with 21 SDoH factors at two levels: level 1 (SDoH factors only) and level 2 (SDoH factors and associated values). Compared to other models, instruction tuned LLM achieved top performance with micro-averaged F1 over 0.9 on level 1 corpora and over 0.84 on level 2 corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. Access to trained models will be made available at https://github.com/BIDS-Xu-Lab/LLMs4SDoH .
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Figures
Update of
-
Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions.medRxiv [Preprint]. 2024 May 22:2024.05.21.24307726. doi: 10.1101/2024.05.21.24307726. medRxiv. 2024. Update in: NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8. PMID: 38826441 Free PMC article. Updated. Preprint.
References
-
- Marmot, M. et al. Closing the gap in a generation: health equity through action on the social determinants of health. Lancet372, 1661–1669 (2008). - PubMed
-
- Singh, G. K., Siahpush, M. & Kogan, M. D. Neighborhood socioeconomic conditions, built environments, and childhood obesity. Health Aff. (Millwood)29, 503–512 (2010). - PubMed
-
- Felitti, V. J. et al. Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults: The Adverse Childhood Experiences (ACE) Study. Am. J. Prev. Med. 14, 245–258 (1998). - PubMed
-
- Healthy People 2030, https://health.gov/healthypeople/priority-areas/social-determinants-health (2023).
Grants and funding
- R01 AG084236/AG/NIA NIH HHS/United States
- R01AG084236/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RM1HG011558/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- R01AG080429/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01 AG083039/AG/NIA NIH HHS/United States
- R01AG083039/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RM1 HG011558/HG/NHGRI NIH HHS/United States
- R01 AG080429/AG/NIA NIH HHS/United States
- RF1 AG072799/AG/NIA NIH HHS/United States
- RF1AG072799/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
LinkOut - more resources
Full Text Sources
Miscellaneous
