Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 12;15(1):16466.
doi: 10.1038/s41598-025-01618-7.

Automated generation of discharge summaries: leveraging large language models with clinical data

Affiliations

Automated generation of discharge summaries: leveraging large language models with clinical data

Matthias Ganzinger et al. Sci Rep. .

Abstract

This study explores the use of open-source large language models (LLMs) to automate generation of German discharge summaries from structured clinical data. The structured data used to produce AI-generated summaries were manually extracted from electronic health records (EHRs) by a trained medical professional. By leveraging structured documentation collected for research and quality management, the goal is to assist physicians with editable draft summaries. After de-identifying 25 patient datasets, we optimized the output of the LLaMA3 model through prompt engineering and evaluated it using error analysis, as well as quantitative and qualitative metrics. The LLM-generated summaries were rated by physicians on comprehensiveness, conciseness, correctness, and fluency. Key results include an error rate of 2.84 mistakes per summary, and low-to-moderate alignment between generated and physician-written summaries (ROUGE-1: 0.25, BERTScore: 0.64). Medical professionals rated the summaries 3.72 ± 0.89 for comprehensiveness and 3.88 ± 0.97 for factual correctness on a 5-point Likert-scale; however, only 60% rated the comprehensiveness as good (4 or 5 out of 5). Despite overall informativeness, essential details-such as patient history, lifestyle factors, and intraoperative findings-were frequently omitted, reflecting gaps in summary completeness. While the LLaMA3 model captured much of the clinical information, complex cases and temporal reasoning presented challenges, leading to factual inaccuracies, such as incorrect age calculations. Limitations include a small dataset size, missing structured data elements, and the model's limited proficiency with German medical terminology, highlighting the need for large, more complete datasets and potential model fine-tuning. In conclusion, this work provides a set of real-world methods, findings, experiences, insights, and descriptive results for a focused use case that may be useful to guide future work in the LLM generation of discharge summaries, perhaps especially for those working with German and possibly other non-English content.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethical approval: Ethical review and approval were waived for generating a patient data sample from completed inpatient cases from the patient registry database of our pancreatic surgery center at Heidelberg University Hospital. No new patients were recruited, and all the obtained patient information was de-identified before data processing. Informed consent had been previously obtained from all subjects. The use of the database from our hospital was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the medical faculty at the University of Heidelberg (S301/2001; S708/2019; S083/2021).

Figures

Fig. 1
Fig. 1
Qualitative evaluation per category; The width of each violin at a given rating value reflects the density of responses. Although most ratings clustered at 4 and 5, a minority of ratings at 2–3 created a visible distribution spread. This helps visualize variability beyond just reporting averages.

Similar articles

References

    1. Lenert, L. A., Sakaguchi, F. H. & Weir, C. R. Rethinking the discharge summary: a focus on handoff communication. Acad. Med. J. Assoc. Am. Med. Coll.89, 393–398. 10.1097/ACM.0000000000000145 (2014). - PMC - PubMed
    1. Salim Al-Damluji, M. et al. Association of discharge summary quality with readmission risk for patients hospitalized with heart failure exacerbation. Circ. Cardiovasc. Qual. Outcomes8, 109–111. 10.1161/CIRCOUTCOMES.114.001476 (2015). - PMC - PubMed
    1. Martin, D. B. et al. Preferences in oncology history documentation styles among clinical practitioners. JCO Oncol. Pract.18, e1–e8. 10.1200/OP.20.00756 (2022). - PubMed
    1. Liu, J., Nicolson, A., Dowling, J., Koopman, B. & Nguyen, A. e-Health CSIRO at “Discharge Me!” 2024: Generating discharge summary sections with fine-tuned language models. (2024).
    1. Hüske-Kraus, D. Suregen-2: A shell system for the generation of clinical documents. In EACL ‘03: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2, 215–218. 10.3115/1067737.1067788 (2003).

LinkOut - more resources