Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 20;31(6):1367-1379.
doi: 10.1093/jamia/ocae052.

Leveraging large language models for generating responses to patient messages-a subjective analysis

Affiliations

Leveraging large language models for generating responses to patient messages-a subjective analysis

Siru Liu et al. J Am Med Inform Assoc. .

Abstract

Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.

Materials and methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.

Results: The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.

Conclusion: This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.

Keywords: artificial intelligence; clinical decision support; large language model; patient portal; primary care.

PubMed Disclaimer

Conflict of interest statement

The authors do not have conflicts of interest related to this study.

Figures

Figure 1.
Figure 1.
Overview of data collection, training process, and evaluation. The logos of CLAIR-Short and CLAIR-Long were generated by Midjourney.
Figure 2.
Figure 2.
An example of updated response using OpenAI API (Turbo-3.5).
Figure 3.
Figure 3.
Stacked bar charts of the ratings of empathy, responsiveness, accuracy, and usefulness.
Figure 4.
Figure 4.
The boxplot comparing BERTScore values of generated responses from CLAIR-Short to actual provider responses.
Figure 5.
Figure 5.
Boxplot of BERTScore of the generated responses from CLAIR-Long compared with the responses from actual providers, ChatGPT3.5, and ChatGPT4.
Figure 6.
Figure 6.
A prototype of potential implementation in of an AI patient message editor in a patient portal interface.

Update of

Similar articles

Cited by

References

    1. Sorace J, Wong H-H, DeLeire T, et al.Quantifying the competitiveness of the electronic health record market and its implications for interoperability. Int J Med Inform. 2020;136:104037. 10.1016/j.ijmedinf.2019.104037 - DOI - PubMed
    1. Tarver WL, Menser T, Hesse BW, et al.Growth dynamics of patient-provider internet communication: trend analysis using the health information national trends survey (2003 to 2013). J Med Internet Res. 2018;20(3):e109. 10.2196/jmir.7851 - DOI - PMC - PubMed
    1. Akbar F, Mark G, Warton EM, et al.Physicians’ electronic inbox work patterns and factors associated with high inbox work duration. J Am Med Inform Assoc. 2021;28(5):923-930. 10.1093/jamia/ocaa229 - DOI - PMC - PubMed
    1. Arndt BG, Beasley JW, Watkinson MD, et al.Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations. Ann Fam Med. 2017;15(5):419-426. 10.1370/afm.2121 - DOI - PMC - PubMed
    1. Steitz BD, Sulieman L, Wright A, et al.Association of immediate release of test results to patients with implications for clinical workflow. JAMA Netw Open. 2021;4(10):e2129553. 10.1001/jamanetworkopen.2021.29553 - DOI - PMC - PubMed