Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 2;7(12):e2448723.
doi: 10.1001/jamanetworkopen.2024.48723.

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes

Affiliations

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes

Vince Hartman et al. JAMA Netw Open. .

Abstract

Importance: An emergency medicine (EM) handoff note generated by a large language model (LLM) has the potential to reduce physician documentation burden without compromising the safety of EM-to-inpatient (IP) handoffs.

Objective: To develop LLM-generated EM-to-IP handoff notes and evaluate their accuracy and safety compared with physician-written notes.

Design, setting, and participants: This cohort study used EM patient medical records with acute hospital admissions that occurred in 2023 at NewYork-Presbyterian/Weill Cornell Medical Center. A customized clinical LLM pipeline was trained, tested, and evaluated to generate templated EM-to-IP handoff notes. Using both conventional automated methods (ie, recall-oriented understudy for gisting evaluation [ROUGE], bidirectional encoder representations from transformers score [BERTScore], and source chunking approach for large-scale inconsistency evaluation [SCALE]) and a novel patient safety-focused framework, LLM-generated handoff notes vs physician-written notes were compared. Data were analyzed from October 2023 to March 2024.

Exposure: LLM-generated EM handoff notes.

Main outcomes and measures: LLM-generated handoff notes were evaluated for (1) lexical similarity with respect to physician-written notes using ROUGE and BERTScore; (2) fidelity with respect to source notes using SCALE; and (3) readability, completeness, curation, correctness, usefulness, and implications for patient safety using a novel framework.

Results: In this study of 1600 EM patient records (832 [52%] female and mean [SD] age of 59.9 [18.9] years), LLM-generated handoff notes, compared with physician-written ones, had higher ROUGE (0.322 vs 0.088), BERTScore (0.859 vs 0.796), and SCALE scores (0.691 vs 0.456), indicating the LLM-generated summaries exhibited greater similarity and more detail. As reviewed by 3 board-certified EM physicians, a subsample of 50 LLM-generated summaries had a mean (SD) usefulness score of 4.04 (0.86) out of 5 (compared with 4.36 [0.71] for physician-written) and mean (SD) patient safety scores of 4.06 (0.86) out of 5 (compared with 4.50 [0.56] for physician-written). None of the LLM-generated summaries were classified as a critical patient safety risk.

Conclusions and relevance: In this cohort study of 1600 EM patient medical records, LLM-generated EM-to-IP handoff notes were determined superior compared with physician-written summaries via conventional automated evaluation methods, but marginally inferior in usefulness and safety via a novel evaluation framework. This study suggests the importance of a physician-in-loop implementation design for this model and demonstrates an effective strategy to measure preimplementation patient safety of LLM models.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Hartman reported holding equity in Abstractive Health during the conduct of the study and holding a patent for automated summarization of a hospital stay using machine learning issued to Abstractive Health. No other disclosures were reported.

Figures

Figure.
Figure.. Data Flow of Generating Emergency Department (ED) Handoff Summary
CBC indicates complete blood count; CMP, comprehensive metabolic panel; CTH, computed tomography of the head; EHR, electronic health record; Hct, hematocrit; Hgb, hemoglobin; HPI, history of present illness; HR, heart rate; IP, inpatient; IVF, intravenous fluid; N/V/D, nausea, vomiting, and diarrhea; RR, respiratory rate; SDU, step down unit; SPO2, peripheral capillary oxygen saturation; WBC, white blood cell; WBG, whole blood glucose.

Comment in

  • doi: 10.1001/jamanetworkopen.2024.48729

References

    1. Cohen MD and Hilligoss PB. The published literature on handoffs in hospitals: deficiencies identified in an extensive review. Qual Saf Health Care. 2010;19(6):493-497. doi:10.1136/qshc.2009.033480 - DOI - PubMed
    1. Donaldson MS, Corrigan JM, Kohn LT. To err is human: building a safer health system. National Academy Press; 2000. - PubMed
    1. Cheung DS, Kelly JJ, Beach C, et al. ; American College of Emergency Physicians Section of Quality Improvement and Patient Safety . Improving Handoffs in the Emergency Department. Ann Emerg Med. 2010;55(2):171-180. doi:10.1016/j.annemergmed.2009.07.016 - DOI - PubMed
    1. Englander R, Flynn T, Call S. Core entrustable professional activities for entering residency: faculty and learners’ guide. Association of American Medical Colleges . 2017. Accessed October 23, 2024. https://www.aamc.org/media/20196/download - PubMed
    1. Starmer AJ, Sectish TC, Simon DW, et al. . Rates of medical errors and preventable adverse events among hospitalized children following implementation of a resident handoff bundle. JAMA. 2013;310(21):2262-2270. doi:10.1001/jama.2013.281961 - DOI - PubMed

Publication types

LinkOut - more resources