A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients
- PMID: 40544901
- DOI: 10.1016/j.jbi.2025.104867
A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients
Abstract
Objective: Generating discharge summaries is a crucial yet time-consuming task in clinical practice, essential for conveying pertinent patient information and facilitating continuity of care. Recent advancements in large language models (LLMs) have significantly enhanced their capability in understanding and summarizing complex medical texts. This research aims to explore how LLMs can alleviate the burden of manual summarization, streamline workflow efficiencies, and support informed decision-making in healthcare settings.
Materials and methods: Clinical notes from a cohort of 1,099 lung cancer patients were utilized, with a subset of 50 patients for testing purposes, and 102 patients used for model fine-tuning. This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries. Evaluation metrics included token-level analysis (BLEU, ROUGE-1, ROUGE-2, ROUGE-L), semantic similarity scores, and manual evaluation of clinical relevance, factual faithfulness, and completeness. An iterative method was further tested on LLaMA 3 8b using clinical notes of varying lengths to examine the stability of its performance.
Results: The study found notable variations in summarization capabilities among LLMs. GPT-4o and fine-tuned LLaMA 3 demonstrated superior token-level evaluation metrics, while manual evaluation further revealed that GPT-4 achieved the highest scores in relevance (4.95 ± 0.22) and factual faithfulness (4.40 ± 0.50), whereas GPT-4o performed best in completeness (4.55 ± 0.69); both models showed comparable overall quality. Semantic similarity scores indicated GPT-4o and LLaMA 3 as leading models in capturing the underlying meaning and context of clinical narratives.
Conclusion: This study contributes insights into the efficacy of LLMs for generating discharge summaries, highlighting the potential of automated summarization tools to enhance documentation precision and efficiency, ultimately improving patient care and operational capability in healthcare settings.
Keywords: Continuity of care; Discharge Summary; EHR; GPT; LLaMA; Large language model; Lung cancer; Text summarization.
Copyright © 2025. Published by Elsevier Inc.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476. JMIR Med Inform. 2025. PMID: 40705416 Free PMC article.
-
A dataset and benchmark for hospital course summarization with adapted large language models.J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312. J Am Med Inform Assoc. 2025. PMID: 39786555
-
Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study.JMIR Form Res. 2024 Oct 24;8:e58418. doi: 10.2196/58418. JMIR Form Res. 2024. PMID: 39447159 Free PMC article.
-
Examining the Role of Large Language Models in Orthopedics: Systematic Review.J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607. J Med Internet Res. 2024. PMID: 39546795 Free PMC article.
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
Cited by
-
Comparing artificial intelligence- vs clinician-authored summaries of simulated primary care electronic health records.JAMIA Open. 2025 Jul 30;8(4):ooaf082. doi: 10.1093/jamiaopen/ooaf082. eCollection 2025 Aug. JAMIA Open. 2025. PMID: 40741008 Free PMC article.
-
Enhancing Relation Extraction for COVID-19 Vaccine Shot-Adverse Event Associations with Large Language Models.Res Sq [Preprint]. 2025 Mar 17:rs.3.rs-6201919. doi: 10.21203/rs.3.rs-6201919/v1. Res Sq. 2025. PMID: 40166033 Free PMC article. Preprint.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical