A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients

Yiming Li¹, Fang Li², Na Hong³, Manqi Li⁴, Kirk Roberts¹, Licong Cui¹, Cui Tao², Hua Xu⁵

Affiliations

¹ McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
² Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA.
³ Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA.
⁴ McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
⁵ Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA. Electronic address: hua.xu@yale.edu.

PMID: 40544901
DOI: 10.1016/j.jbi.2025.104867

Comparative Study

A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients

Yiming Li et al. J Biomed Inform. 2025 Aug.

. 2025 Aug:168:104867.

doi: 10.1016/j.jbi.2025.104867. Epub 2025 Jun 20.

Authors

Yiming Li¹, Fang Li², Na Hong³, Manqi Li⁴, Kirk Roberts¹, Licong Cui¹, Cui Tao², Hua Xu⁵

Affiliations

¹ McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
² Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA.
³ Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA.
⁴ McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
⁵ Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA. Electronic address: hua.xu@yale.edu.

PMID: 40544901
DOI: 10.1016/j.jbi.2025.104867

Abstract

Objective: Generating discharge summaries is a crucial yet time-consuming task in clinical practice, essential for conveying pertinent patient information and facilitating continuity of care. Recent advancements in large language models (LLMs) have significantly enhanced their capability in understanding and summarizing complex medical texts. This research aims to explore how LLMs can alleviate the burden of manual summarization, streamline workflow efficiencies, and support informed decision-making in healthcare settings.

Materials and methods: Clinical notes from a cohort of 1,099 lung cancer patients were utilized, with a subset of 50 patients for testing purposes, and 102 patients used for model fine-tuning. This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries. Evaluation metrics included token-level analysis (BLEU, ROUGE-1, ROUGE-2, ROUGE-L), semantic similarity scores, and manual evaluation of clinical relevance, factual faithfulness, and completeness. An iterative method was further tested on LLaMA 3 8b using clinical notes of varying lengths to examine the stability of its performance.

Results: The study found notable variations in summarization capabilities among LLMs. GPT-4o and fine-tuned LLaMA 3 demonstrated superior token-level evaluation metrics, while manual evaluation further revealed that GPT-4 achieved the highest scores in relevance (4.95 ± 0.22) and factual faithfulness (4.40 ± 0.50), whereas GPT-4o performed best in completeness (4.55 ± 0.69); both models showed comparable overall quality. Semantic similarity scores indicated GPT-4o and LLaMA 3 as leading models in capturing the underlying meaning and context of clinical narratives.

Conclusion: This study contributes insights into the efficacy of LLMs for generating discharge summaries, highlighting the potential of automated summarization tools to enhance documentation precision and efficiency, ultimately improving patient care and operational capability in healthcare settings.

Keywords: Continuity of care; Discharge Summary; EHR; GPT; LLaMA; Large language model; Lung cancer; Text summarization.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients

Affiliations

A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients

Authors

Affiliations

Abstract

Conflict of interest statement

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical