Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 1;8(8):e2526339.
doi: 10.1001/jamanetworkopen.2025.26339.

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

Affiliations

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model

William R Small et al. JAMA Netw Open. .

Abstract

Importance: Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown.

Objectives: To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC.

Design, setting, and participants: Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health.

Exposures: Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists.

Main outcomes and measures: Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales).

Results: Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46).

Conclusions and relevance: Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Hochman reported receiving consulting fees from Gilead, Quality Matters, and The Ohio State University Wexler Medical Center. Dr Goodman reported receiving personal fees from Ambu, Iterative Health, and Boston Scientific outside the submitted work. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Density Plots Comparing the Distributions of Editing Metrics for Large Language Model (LLM) and Clinician Hospital Courses
Figure 2.
Figure 2.. Components of the 4Cs Hospital Course Quality Standard and Composite Scores
A score of −10 indicates physician HC rated significantly better, −5 indicates physician HC rated slightly better, 0 indicates HC pair equal, 5 indicates LLM HC rated slightly better, 10 indicates LLM HC rated significantly better.

References

    1. Chatterton B, Chen J, Schwarz EB, Karlin J. Primary care physicians’ perspectives on high-quality discharge summaries. J Gen Intern Med. 2024;39(8):1438-1443. doi: 10.1007/s11606-023-08541-5 - DOI - PMC - PubMed
    1. Shivji FS, Ramoutar DN, Bailey C, Hunter JB. Improving communication with primary care to ensure patient safety post-hospital discharge. Br J Hosp Med (Lond). 2015;76(1):46-49. doi: 10.12968/hmed.2015.76.1.46 - DOI - PubMed
    1. Sorita A, Robelia PM, Kattel SB, et al. The ideal hospital discharge summary: a survey of U.S. physicians. J Patient Saf. 2021;17(7):e637-e644. doi: 10.1097/PTS.0000000000000421 - DOI - PubMed
    1. Bernal JL, DelBusto S, García-Mañoso MI, et al. Impact of the implementation of electronic health records on the quality of discharge summaries and on the coding of hospitalization episodes. Int J Qual Health Care. 2018;30(8):630-636. doi: 10.1093/intqhc/mzy075 - DOI - PubMed
    1. Adams G, Alsentzer E, Ketenci M, Zucker J, Elhadad N. What’s in a summary? Laying the groundwork for advances in hospital-course summarization. ACL Anthology. 2021. Accessed July 11, 2025. https://aclanthology.org/2021.naacl-main.382/ - PMC - PubMed