This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Jul 16:2023.07.14.23292669.

doi: 10.1101/2023.07.14.23292669.

Leveraging Large Language Models for Generating Responses to Patient Messages

Siru Liu, Allison B McCoy, Aileen P Wright, Babatunde Carew, Julian Z Genkins, Sean S Huang, Josh F Peterson, Bryan Steitz, Adam Wright

PMID: 37503263
PMCID: PMC10370222
DOI: 10.1101/2023.07.14.23292669

Leveraging Large Language Models for Generating Responses to Patient Messages

Siru Liu et al. medRxiv. 2023.

[Preprint]. 2023 Jul 16:2023.07.14.23292669.

doi: 10.1101/2023.07.14.23292669.

Authors

Siru Liu, Allison B McCoy, Aileen P Wright, Babatunde Carew, Julian Z Genkins, Sean S Huang, Josh F Peterson, Bryan Steitz, Adam Wright

PMID: 37503263
PMCID: PMC10370222
DOI: 10.1101/2023.07.14.23292669

Update in

Leveraging large language models for generating responses to patient messages-a subjective analysis.
Liu S, McCoy AB, Wright AP, Carew B, Genkins JZ, Huang SS, Peterson JF, Steitz B, Wright A. Liu S, et al. J Am Med Inform Assoc. 2024 May 20;31(6):1367-1379. doi: 10.1093/jamia/ocae052. J Am Med Inform Assoc. 2024. PMID: 38497958 Free PMC article.

Abstract

Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.

Methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate the fine-tuned models, we used ten representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.

Results: The dataset consisted of a total of 499,794 pairs of patient messages and corresponding responses from the patient portal, with 5,000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.

Conclusion: Leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and primary care providers.

PubMed Disclaimer

Publication types

Actions

LinkOut - more resources

Full Text Sources
- Cold Spring Harbor Laboratory
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Leveraging Large Language Models for Generating Responses to Patient Messages

Leveraging Large Language Models for Generating Responses to Patient Messages

Authors

Update in

Abstract

Publication types

LinkOut - more resources

Full Text Sources