. 2024 May 20;31(6):1367-1379.

doi: 10.1093/jamia/ocae052.

Leveraging large language models for generating responses to patient messages-a subjective analysis

Siru Liu¹, Allison B McCoy¹, Aileen P Wright^{1

2}, Babatunde Carew³, Julian Z Genkins⁴, Sean S Huang^{1

2}, Josh F Peterson^{1

2}, Bryan Steitz¹, Adam Wright¹

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
² Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
³ Department of General Internal Medicine and Public Health, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
⁴ Department of Medicine, Stanford University, Stanford, CA 94304, United States.

PMID: 38497958
PMCID: PMC11105129
DOI: 10.1093/jamia/ocae052

Leveraging large language models for generating responses to patient messages-a subjective analysis

Siru Liu et al. J Am Med Inform Assoc. 2024.

. 2024 May 20;31(6):1367-1379.

doi: 10.1093/jamia/ocae052.

Authors

Siru Liu¹, Allison B McCoy¹, Aileen P Wright^{1

2}, Babatunde Carew³, Julian Z Genkins⁴, Sean S Huang^{1

2}, Josh F Peterson^{1

2}, Bryan Steitz¹, Adam Wright¹

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
² Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
³ Department of General Internal Medicine and Public Health, Vanderbilt University Medical Center, Nashville, TN 37212, United States.
⁴ Department of Medicine, Stanford University, Stanford, CA 94304, United States.

PMID: 38497958
PMCID: PMC11105129
DOI: 10.1093/jamia/ocae052

Abstract

Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.

Materials and methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.

Results: The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.

Conclusion: This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.

Keywords: artificial intelligence; clinical decision support; large language model; patient portal; primary care.

PubMed Disclaimer

Conflict of interest statement

The authors do not have conflicts of interest related to this study.

Figures

**Figure 1.**
Overview of data collection, training process, and evaluation. The logos of CLAIR-Short and CLAIR-Long were generated by Midjourney.

**Figure 2.**
An example of updated response using OpenAI API (Turbo-3.5).

**Figure 3.**
Stacked bar charts of the ratings of empathy, responsiveness, accuracy, and usefulness.

**Figure 4.**
The boxplot comparing BERTScore values of generated responses from CLAIR-Short to actual provider responses.

**Figure 5.**
Boxplot of BERTScore of the generated responses from CLAIR-Long compared with the responses from actual providers, ChatGPT3.5, and ChatGPT4.

**Figure 6.**
A prototype of potential implementation in of an AI patient message editor in a patient portal interface.

See this image and copyright information in PMC

Update of

Leveraging Large Language Models for Generating Responses to Patient Messages.
Liu S, McCoy AB, Wright AP, Carew B, Genkins JZ, Huang SS, Peterson JF, Steitz B, Wright A. Liu S, et al. medRxiv [Preprint]. 2023 Jul 16:2023.07.14.23292669. doi: 10.1101/2023.07.14.23292669. medRxiv. 2023. Update in: J Am Med Inform Assoc. 2024 May 20;31(6):1367-1379. doi: 10.1093/jamia/ocae052. PMID: 37503263 Free PMC article. Updated. Preprint.

References

1. Sorace J, Wong H-H, DeLeire T, et al. Quantifying the competitiveness of the electronic health record market and its implications for interoperability. Int J Med Inform. 2020;136:104037. 10.1016/j.ijmedinf.2019.104037 - DOI - PubMed
1. Tarver WL, Menser T, Hesse BW, et al. Growth dynamics of patient-provider internet communication: trend analysis using the health information national trends survey (2003 to 2013). J Med Internet Res. 2018;20(3):e109. 10.2196/jmir.7851 - DOI - PMC - PubMed
1. Akbar F, Mark G, Warton EM, et al. Physicians’ electronic inbox work patterns and factors associated with high inbox work duration. J Am Med Inform Assoc. 2021;28(5):923-930. 10.1093/jamia/ocaa229 - DOI - PMC - PubMed
1. Arndt BG, Beasley JW, Watkinson MD, et al. Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations. Ann Fam Med. 2017;15(5):419-426. 10.1370/afm.2121 - DOI - PMC - PubMed
1. Steitz BD, Sulieman L, Wright A, et al. Association of immediate release of test results to patients with implications for clinical workflow. JAMA Netw Open. 2021;4(10):e2129553. 10.1001/jamanetworkopen.2021.29553 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Leveraging large language models for generating responses to patient messages-a subjective analysis

Affiliations

Leveraging large language models for generating responses to patient messages-a subjective analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources