. 2024 Dec 2;7(12):e2448723.

doi: 10.1001/jamanetworkopen.2024.48723.

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes

Vince Hartman¹, Xinyuan Zhang¹, Ritika Poddar¹, Matthew McCarty², Alexander Fortenko², Evan Sholle³, Rahul Sharma², Thomas Campion Jr^{3

4}, Peter A D Steel²

Affiliations

¹ Abstractive Health, New York, New York.
² Department of Emergency Medicine, NewYork-Presbyterian/Weill Cornell Medicine, New York.
³ Department of Population Health, NewYork-Presbyterian/Weill Cornell Medicine, New York.
⁴ Clinical and Translational Science Center, Weill Cornell Medicine, New York, New York.

PMID: 39625719
PMCID: PMC11615705
DOI: 10.1001/jamanetworkopen.2024.48723

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes

Vince Hartman et al. JAMA Netw Open. 2024.

. 2024 Dec 2;7(12):e2448723.

doi: 10.1001/jamanetworkopen.2024.48723.

Authors

Vince Hartman¹, Xinyuan Zhang¹, Ritika Poddar¹, Matthew McCarty², Alexander Fortenko², Evan Sholle³, Rahul Sharma², Thomas Campion Jr^{3

4}, Peter A D Steel²

Affiliations

¹ Abstractive Health, New York, New York.
² Department of Emergency Medicine, NewYork-Presbyterian/Weill Cornell Medicine, New York.
³ Department of Population Health, NewYork-Presbyterian/Weill Cornell Medicine, New York.
⁴ Clinical and Translational Science Center, Weill Cornell Medicine, New York, New York.

PMID: 39625719
PMCID: PMC11615705
DOI: 10.1001/jamanetworkopen.2024.48723

Abstract

Importance: An emergency medicine (EM) handoff note generated by a large language model (LLM) has the potential to reduce physician documentation burden without compromising the safety of EM-to-inpatient (IP) handoffs.

Objective: To develop LLM-generated EM-to-IP handoff notes and evaluate their accuracy and safety compared with physician-written notes.

Design, setting, and participants: This cohort study used EM patient medical records with acute hospital admissions that occurred in 2023 at NewYork-Presbyterian/Weill Cornell Medical Center. A customized clinical LLM pipeline was trained, tested, and evaluated to generate templated EM-to-IP handoff notes. Using both conventional automated methods (ie, recall-oriented understudy for gisting evaluation [ROUGE], bidirectional encoder representations from transformers score [BERTScore], and source chunking approach for large-scale inconsistency evaluation [SCALE]) and a novel patient safety-focused framework, LLM-generated handoff notes vs physician-written notes were compared. Data were analyzed from October 2023 to March 2024.

Exposure: LLM-generated EM handoff notes.

Main outcomes and measures: LLM-generated handoff notes were evaluated for (1) lexical similarity with respect to physician-written notes using ROUGE and BERTScore; (2) fidelity with respect to source notes using SCALE; and (3) readability, completeness, curation, correctness, usefulness, and implications for patient safety using a novel framework.

Results: In this study of 1600 EM patient records (832 [52%] female and mean [SD] age of 59.9 [18.9] years), LLM-generated handoff notes, compared with physician-written ones, had higher ROUGE (0.322 vs 0.088), BERTScore (0.859 vs 0.796), and SCALE scores (0.691 vs 0.456), indicating the LLM-generated summaries exhibited greater similarity and more detail. As reviewed by 3 board-certified EM physicians, a subsample of 50 LLM-generated summaries had a mean (SD) usefulness score of 4.04 (0.86) out of 5 (compared with 4.36 [0.71] for physician-written) and mean (SD) patient safety scores of 4.06 (0.86) out of 5 (compared with 4.50 [0.56] for physician-written). None of the LLM-generated summaries were classified as a critical patient safety risk.

Conclusions and relevance: In this cohort study of 1600 EM patient medical records, LLM-generated EM-to-IP handoff notes were determined superior compared with physician-written summaries via conventional automated evaluation methods, but marginally inferior in usefulness and safety via a novel evaluation framework. This study suggests the importance of a physician-in-loop implementation design for this model and demonstrates an effective strategy to measure preimplementation patient safety of LLM models.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Hartman reported holding equity in Abstractive Health during the conduct of the study and holding a patent for automated summarization of a hospital stay using machine learning issued to Abstractive Health. No other disclosures were reported.

Figures

See this image and copyright information in PMC

Comment in

doi: 10.1001/jamanetworkopen.2024.48729

References

1. Cohen MD and Hilligoss PB. The published literature on handoffs in hospitals: deficiencies identified in an extensive review. Qual Saf Health Care. 2010;19(6):493-497. doi:10.1136/qshc.2009.033480 - DOI - PubMed
1. Donaldson MS, Corrigan JM, Kohn LT. To err is human: building a safer health system. National Academy Press; 2000. - PubMed
1. Cheung DS, Kelly JJ, Beach C, et al. ; American College of Emergency Physicians Section of Quality Improvement and Patient Safety . Improving Handoffs in the Emergency Department. Ann Emerg Med. 2010;55(2):171-180. doi:10.1016/j.annemergmed.2009.07.016 - DOI - PubMed
1. Englander R, Flynn T, Call S. Core entrustable professional activities for entering residency: faculty and learners’ guide. Association of American Medical Colleges . 2017. Accessed October 23, 2024. https://www.aamc.org/media/20196/download - PubMed
1. Starmer AJ, Sectish TC, Simon DW, et al. . Rates of medical errors and preventable adverse events among hospitalized children following implementation of a resident handoff bundle. JAMA. 2013;310(21):2262-2270. doi:10.1001/jama.2013.281961 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes

Affiliations

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials