Development and evaluation of a clinical note summarization system using large language models
- PMID: 40877595
- PMCID: PMC12394402
- DOI: 10.1038/s43856-025-01091-3
Development and evaluation of a clinical note summarization system using large language models
Abstract
Background: Clinical notes are a vital and detailed source of information about patient hospitalizations. However, the sheer volume and complexity of these notes make evaluation and summarization challenging. Nonetheless, summarizing clinical notes is essential for accurate and efficient clinical decision-making in patient care. Generative language models, particularly large language models such as GPT-4, offer a promising solution by creating coherent, contextually relevant text based on patterns learned from large datasets.
Methods: This study describes the development of a discharge summary system using large language models. By conducting an online survey and interviews, we gather feedback from end users, including physicians and patients, to ensure the system meets their practical needs and fits their experiences. Additionally, we develop a rating system to evaluate prompt effectiveness by comparing model-generated outputs with human assessments, which serve as benchmarks to evaluate the performance of the automated model.
Results: Here we show that the model's ability to interpret diagnoses borders on humanlevel accuracy, demonstrating its potential to assist healthcare professionals in routine tasks such as generating discharge summaries.
Conclusions: This advancement underscores the potential of large language models in clinical settings and opens up possibilities for broader applications in healthcare documentation and decision-making support.
Plain language summary
This study developed a system to support physicians in writing hospital discharge summaries. Clinical notes often include essential patient information, but their length and complexity can make it challenging to summarize them efficiently. To address this, we applied artificial intelligence (AI) techniques to help generate clear and organized summaries based on patient data. We collected input from both physicians and patients through surveys and interviews to ensure the system aligned with their needs. We also evaluated the summaries created by the system by comparing them to those written by healthcare professionals. The results showed that the AI-generated summaries were comparable in accuracy to human-written versions. This suggests that such a system could assist physicians in their documentation tasks and contribute to clearer communication during care transitions. Future applications may include other types of clinical documentation.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Figures


References
-
- Chua, C. E. & Teo, D. B. Writing a high-quality discharge summary through structured training and assessment. Med. Educ.57, 773–774 (2023). - PubMed
-
- Sebastianus, F. & Suharto, E. Information system design completeness of filling out discharge summary of inpatients. J. Tek. Inform.3, 877–887 (2022).
-
- Dielissen, P. W. & Beuken-van Everdingen, M. Quality of discharge summary for patients with limited life expectancy. Ned. Tijdschr. Voor Geneeskd.166, 6575–6575 (2022). - PubMed
-
- Goodman, H. Discharging patients from acute care hospitals. Nurs. Stand.30, 49–60 (2016). - PubMed
LinkOut - more resources
Full Text Sources