Automated generation of discharge summaries: leveraging large language models with clinical data
- PMID: 40355506
- PMCID: PMC12069548
- DOI: 10.1038/s41598-025-01618-7
Automated generation of discharge summaries: leveraging large language models with clinical data
Abstract
This study explores the use of open-source large language models (LLMs) to automate generation of German discharge summaries from structured clinical data. The structured data used to produce AI-generated summaries were manually extracted from electronic health records (EHRs) by a trained medical professional. By leveraging structured documentation collected for research and quality management, the goal is to assist physicians with editable draft summaries. After de-identifying 25 patient datasets, we optimized the output of the LLaMA3 model through prompt engineering and evaluated it using error analysis, as well as quantitative and qualitative metrics. The LLM-generated summaries were rated by physicians on comprehensiveness, conciseness, correctness, and fluency. Key results include an error rate of 2.84 mistakes per summary, and low-to-moderate alignment between generated and physician-written summaries (ROUGE-1: 0.25, BERTScore: 0.64). Medical professionals rated the summaries 3.72 ± 0.89 for comprehensiveness and 3.88 ± 0.97 for factual correctness on a 5-point Likert-scale; however, only 60% rated the comprehensiveness as good (4 or 5 out of 5). Despite overall informativeness, essential details-such as patient history, lifestyle factors, and intraoperative findings-were frequently omitted, reflecting gaps in summary completeness. While the LLaMA3 model captured much of the clinical information, complex cases and temporal reasoning presented challenges, leading to factual inaccuracies, such as incorrect age calculations. Limitations include a small dataset size, missing structured data elements, and the model's limited proficiency with German medical terminology, highlighting the need for large, more complete datasets and potential model fine-tuning. In conclusion, this work provides a set of real-world methods, findings, experiences, insights, and descriptive results for a focused use case that may be useful to guide future work in the LLM generation of discharge summaries, perhaps especially for those working with German and possibly other non-English content.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests. Ethical approval: Ethical review and approval were waived for generating a patient data sample from completed inpatient cases from the patient registry database of our pancreatic surgery center at Heidelberg University Hospital. No new patients were recruited, and all the obtained patient information was de-identified before data processing. Informed consent had been previously obtained from all subjects. The use of the database from our hospital was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the medical faculty at the University of Heidelberg (S301/2001; S708/2019; S083/2021).
Figures

Similar articles
-
Large language models can support generation of standardized discharge summaries - A retrospective study utilizing ChatGPT-4 and electronic health records.Int J Med Inform. 2024 Dec;192:105654. doi: 10.1016/j.ijmedinf.2024.105654. Epub 2024 Oct 14. Int J Med Inform. 2024. PMID: 39437512
-
Summarize-then-Prompt: A Novel Prompt Engineering Strategy for Generating High-Quality Discharge Summaries.Appl Clin Inform. 2025 May 21. doi: 10.1055/a-2617-6572. Online ahead of print. Appl Clin Inform. 2025. PMID: 40398851
-
Evaluation of a large language model to simplify discharge summaries and provide cardiological lifestyle recommendations.Commun Med (Lond). 2025 May 29;5(1):208. doi: 10.1038/s43856-025-00927-2. Commun Med (Lond). 2025. PMID: 40442348 Free PMC article.
-
Effectiveness of Transformer-Based Large Language Models in Identifying Adverse Drug Reaction Relations from Unstructured Discharge Summaries in Singapore.Drug Saf. 2025 Jun;48(6):667-677. doi: 10.1007/s40264-025-01525-w. Epub 2025 Feb 21. Drug Saf. 2025. PMID: 39982676
-
Utilizing large language models for gastroenterology research: a conceptual framework.Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025. Therap Adv Gastroenterol. 2025. PMID: 40171241 Free PMC article. Review.
References
-
- Martin, D. B. et al. Preferences in oncology history documentation styles among clinical practitioners. JCO Oncol. Pract.18, e1–e8. 10.1200/OP.20.00756 (2022). - PubMed
-
- Liu, J., Nicolson, A., Dowling, J., Koopman, B. & Nguyen, A. e-Health CSIRO at “Discharge Me!” 2024: Generating discharge summary sections with fine-tuned language models. (2024).
-
- Hüske-Kraus, D. Suregen-2: A shell system for the generation of clinical documents. In EACL ‘03: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2, 215–218. 10.3115/1067737.1067788 (2003).
MeSH terms
LinkOut - more resources
Full Text Sources