Large language models to identify social determinants of health in electronic health records

Marco Guevara^#^{1

2}, Shan Chen^#^{1

2}, Spencer Thomas^{1

2

3}, Tafadzwa L Chaunzwa^{1

2}, Idalid Franco², Benjamin H Kann^{1

2}, Shalini Moningi², Jack M Qian^{1

2}, Madeleine Goldstein⁴, Susan Harper⁴, Hugo J W L Aerts^{1

2

5}, Paul J Catalano⁶, Guergana K Savova³, Raymond H Mak^{1

2}, Danielle S Bitterman^{7

8}

Affiliations

¹ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
² Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
³ Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
⁴ Adult Resource Office, Dana-Farber Cancer Institute, Boston, MA, USA.
⁵ Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, The Netherlands.
⁶ Department of Data Science, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
⁷ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA. Danielle_Bitterman@dfci.harvard.edu.
⁸ Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA. Danielle_Bitterman@dfci.harvard.edu.

^# Contributed equally.

PMID: 38200151
PMCID: PMC10781957
DOI: 10.1038/s41746-023-00970-0

Large language models to identify social determinants of health in electronic health records

Marco Guevara et al. NPJ Digit Med. 2024.

. 2024 Jan 11;7(1):6.

doi: 10.1038/s41746-023-00970-0.

Authors

Affiliations

¹ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
² Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
³ Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
⁴ Adult Resource Office, Dana-Farber Cancer Institute, Boston, MA, USA.
⁵ Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, The Netherlands.
⁶ Department of Data Science, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
⁷ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA. Danielle_Bitterman@dfci.harvard.edu.
⁸ Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA. Danielle_Bitterman@dfci.harvard.edu.

^# Contributed equally.

PMID: 38200151
PMCID: PMC10781957
DOI: 10.1038/s41746-023-00970-0

Abstract

Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.

PubMed Disclaimer

Conflict of interest statement

M.G., S.C., S.T., T.L.C., I.F., B.H.K., S.M., J.M.Q., M.G., S.H.: none. H.J.W.L.A.: advisory and consulting, unrelated to this work (Onc.AI, Love Health Inc, Sphera, Editas, A.Z., and BMS). P.J.C. and G.K.S.: None. R.H.M.: advisory board (ViewRay, AstraZeneca), Consulting (Varian Medical Systems, Sio Capital Management), Honorarium (Novartis, Springer Nature). D.S.B.: Associate Editor of Radiation Oncology, HemOnc.org (no financial compensation, unrelated to this work); funding from American Association for Cancer Research (unrelated to this work).

Figures

**Fig. 1. Ablation studies.**
Performance in Macro-F1 of Flan-T5 XL models fine-tuned using gold data only (orange line) and gold and synthetic data (green line), as gold-labeled sentences are gradually reduced by undersample value from the training dataset for the a adverse social determinant of health (SDoH) mention task and b any SDoH mention task. The full gold-labeled training set is comprised of 29,869 sentences, augmented with 1800 synthetic SDoH sentences, and tested on the in-domain RT test dataset. SDoH Social determinants of health.

**Fig. 2. Fine-tuned LLMs versus ChatGPT-family models.**
Comparison of model performance between our fine-tuned Flan-T5 models against zero- and 10-shot GPT. Macro-F1 was measured using our manually validated synthetic dataset. The GPT-turbo-0613 version of GPT3.5 and the GPT4–0613 version of GPT4 were used. Error bars indicate the 95% confidence intervals. LLM large language model.

**Fig. 3. LLM bias evaluation.**
The proportion of synthetic sentence pairs with and without demographics injected led to a classification mismatch, meaning that the model predicted a different SDoH label for each sentence in the pair. Results are shown across race/ethnicity and gender for a any SDoH mention task and b adverse SDoH mention task. Asterisks indicate statistical significance (P ≤ 0.05) chi-squared tests for multi-class comparisons and 2-proportion z tests for binary comparisons. LLM large language model, SDoH Social determinants of health.

**Fig. 4. Prompting methods.**
Example of prompt templates used in the SKLLM package for GPT-turbo-0301 (GPT3.5) and GPT4 with temperature 0 to classify our labeled synthetic data. {labels} and {training_data} were sampled from a separate synthetic dataset, which was not human-annotated. The final label output is highlighted in green.

**Fig. 5. Demographic-injected SDoH language development.**
Illustration of generating and comparing synthetic demographic-injected SDoH language pairs to assess how adding race/ethnicity and gender information into a sentence may impact model performance. FT fine-tuned, SDoH Social determinants of health.

See this image and copyright information in PMC

References

1. Lavizzo-Mourey RJ, Besser RE, Williams DR. Understanding and mitigating health inequities - past, current, and future directions. N. Engl. J. Med. 2021;384:1681–1684. - PubMed
1. Chetty R, et al. The association between income and life expectancy in the United States, 2001-2014. JAMA. 2016;315:1750–1766. - PMC - PubMed
1. Caraballo C, et al. Excess mortality and years of potential life lost among the black population in the US, 1999-2020. JAMA. 2023;329:1662–1670. - PMC - PubMed
1. Social determinants of health. http://www.who.int/social_determinants/sdh_definition/en/.
1. Franke HA. Toxic stress: effects, prevention and treatment. Children. 2014;1:390–402. - PMC - PubMed

Grants and funding

U24 CA194354/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large language models to identify social determinants of health in electronic health records

Affiliations

Large language models to identify social determinants of health in electronic health records

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources