Social determinants of health extraction from clinical notes across institutions using large language models
- PMID: 40379919
- PMCID: PMC12084648
- DOI: 10.1038/s41746-025-01645-8
Social determinants of health extraction from clinical notes across institutions using large language models
Abstract
Detailed social determinants of health (SDoH) is often buried within clinical text in EHRs. Most current NLP efforts for SDoH have limitations, investigating limited factors, deriving data from a single institution, using specific patient cohorts/note types, with reduced focus on generalizability. We aim to address these issues by creating cross-institutional corpora and developing and evaluating the generalizability of classification models, including large language models (LLMs), for detecting SDoH factors using data from four institutions. Clinical notes were annotated with 21 SDoH factors at two levels: level 1 (SDoH factors only) and level 2 (SDoH factors and associated values). Compared to other models, instruction tuned LLM achieved top performance with micro-averaged F1 over 0.9 on level 1 corpora and over 0.84 on level 2 corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. Access to trained models will be made available at https://github.com/BIDS-Xu-Lab/LLMs4SDoH .
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Figures




Update of
-
Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions.medRxiv [Preprint]. 2024 May 22:2024.05.21.24307726. doi: 10.1101/2024.05.21.24307726. medRxiv. 2024. Update in: NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8. PMID: 38826441 Free PMC article. Updated. Preprint.
Similar articles
-
Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions.medRxiv [Preprint]. 2024 May 22:2024.05.21.24307726. doi: 10.1101/2024.05.21.24307726. medRxiv. 2024. Update in: NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8. PMID: 38826441 Free PMC article. Updated. Preprint.
-
Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias.J Biomed Inform. 2024 May;153:104642. doi: 10.1016/j.jbi.2024.104642. Epub 2024 Apr 14. J Biomed Inform. 2024. PMID: 38621641 Free PMC article.
-
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7. J Biomed Inform. 2022. PMID: 35007754
-
Extracting social determinants of health from electronic health records using natural language processing: a systematic review.J Am Med Inform Assoc. 2021 Nov 25;28(12):2716-2727. doi: 10.1093/jamia/ocab170. J Am Med Inform Assoc. 2021. PMID: 34613399 Free PMC article.
-
Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review.J Am Med Inform Assoc. 2020 Nov 1;27(11):1764-1773. doi: 10.1093/jamia/ocaa143. J Am Med Inform Assoc. 2020. PMID: 33202021 Free PMC article.
Cited by
-
Unveiling social determinants of health impact on adverse pregnancy outcomes through natural language processing.Sci Rep. 2025 Aug 9;15(1):29183. doi: 10.1038/s41598-025-13542-x. Sci Rep. 2025. PMID: 40783439 Free PMC article.
-
Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors.ArXiv [Preprint]. 2025 Jul 22:arXiv:2507.17009v1. ArXiv. 2025. PMID: 40740509 Free PMC article. Preprint.
References
-
- Marmot, M. et al. Closing the gap in a generation: health equity through action on the social determinants of health. Lancet372, 1661–1669 (2008). - PubMed
-
- Singh, G. K., Siahpush, M. & Kogan, M. D. Neighborhood socioeconomic conditions, built environments, and childhood obesity. Health Aff. (Millwood)29, 503–512 (2010). - PubMed
-
- Felitti, V. J. et al. Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults: The Adverse Childhood Experiences (ACE) Study. Am. J. Prev. Med. 14, 245–258 (1998). - PubMed
-
- Healthy People 2030, https://health.gov/healthypeople/priority-areas/social-determinants-health (2023).
Grants and funding
- R01 AG084236/AG/NIA NIH HHS/United States
- R01AG084236/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RM1HG011558/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- R01AG080429/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01 AG083039/AG/NIA NIH HHS/United States
- R01AG083039/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RM1 HG011558/HG/NHGRI NIH HHS/United States
- R01 AG080429/AG/NIA NIH HHS/United States
- RF1 AG072799/AG/NIA NIH HHS/United States
- RF1AG072799/U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
LinkOut - more resources
Full Text Sources
Miscellaneous