Comparative Study

. 2025 Sep 4;15(9):e099301.

doi: 10.1136/bmjopen-2025-099301.

Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study

Collaborators, Affiliations

Collaborators

Applied Artificial Intelligence in Healthcare Consortium:
M Aalderink, R van den Berg, M T P Besouw, A V Biere, F A J A Bodewes, A L Boerboom, M A J Borgdorff, M H de Borst, M Bouhuys, B R Brandsema, G H Bultema, M J Crop, H P J van der Doef, J W J Donkers, J M Douwes, R A Feijen, F Fontanella, B Foreman, V Gracchi, I De Groot, G B Halmos, A A van Heerwaarde, F van den Heuvel, C Holzhauer, F F A IJpma, E Kersten, R J H Knoef, M C A Kramer, S Krishnapillai, M Labberté, J M Lammers, L B de Langen, E Lensen, W S Lexmond, E T Liem, E Loeffen, J Lorius, C Lubout, J Ludwig-Roukema, S Luiten, D Meijering, C Out, S Palthe, M T R Roofthooft, R Scheenstra, R S B H Schreuder, M L Schrijvers, P F Sinnige, W J van Veen, C A Te Velde-Keyzer, K T Verbruggen, M Verheijen, F P J Vernimmen, J de Vries, W de Weerd, C L Welsink, J E J Woolderink, A T Zwart

Affiliations

¹ Department of Otolaryngology - Head and Neck Surgery, University Medical Centre Groningen, Groningen, The Netherlands r.c.schoonbeek@umcg.nl.
² Department of Medical Information Technology, University Medical Centre Groningen, Groningen, The Netherlands.
³ Department of Intensive Care, Elisabeth-TweeSteden Ziekenhuis, Tilburg, The Netherlands.
⁴ Department of Adult Intensive Care, Erasmus MC University Medical Center, Erasmus Universiteit Rotterdam, Rotterdam, The Netherlands.
⁵ Board of Directors, University Medical Center, University Medical Centre Groningen, Groningen, The Netherlands.
⁶ Orthopaedic Surgery, University Medical Centre Groningen, Groningen, The Netherlands.
⁷ Universitair Medisch Centrum Groningen, Groningen, The Netherlands.
⁸ Department of Pediatrics, University Medical Centre Groningen, Groningen, The Netherlands.

PMID: 40908007
PMCID: PMC12414186
DOI: 10.1136/bmjopen-2025-099301

Comparative Study

Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study

Rosanne C Schoonbeek et al. BMJ Open. 2025.

. 2025 Sep 4;15(9):e099301.

doi: 10.1136/bmjopen-2025-099301.

Collaborators

Applied Artificial Intelligence in Healthcare Consortium:
M Aalderink, R van den Berg, M T P Besouw, A V Biere, F A J A Bodewes, A L Boerboom, M A J Borgdorff, M H de Borst, M Bouhuys, B R Brandsema, G H Bultema, M J Crop, H P J van der Doef, J W J Donkers, J M Douwes, R A Feijen, F Fontanella, B Foreman, V Gracchi, I De Groot, G B Halmos, A A van Heerwaarde, F van den Heuvel, C Holzhauer, F F A IJpma, E Kersten, R J H Knoef, M C A Kramer, S Krishnapillai, M Labberté, J M Lammers, L B de Langen, E Lensen, W S Lexmond, E T Liem, E Loeffen, J Lorius, C Lubout, J Ludwig-Roukema, S Luiten, D Meijering, C Out, S Palthe, M T R Roofthooft, R Scheenstra, R S B H Schreuder, M L Schrijvers, P F Sinnige, W J van Veen, C A Te Velde-Keyzer, K T Verbruggen, M Verheijen, F P J Vernimmen, J de Vries, W de Weerd, C L Welsink, J E J Woolderink, A T Zwart

Affiliations

¹ Department of Otolaryngology - Head and Neck Surgery, University Medical Centre Groningen, Groningen, The Netherlands r.c.schoonbeek@umcg.nl.
² Department of Medical Information Technology, University Medical Centre Groningen, Groningen, The Netherlands.
³ Department of Intensive Care, Elisabeth-TweeSteden Ziekenhuis, Tilburg, The Netherlands.
⁴ Department of Adult Intensive Care, Erasmus MC University Medical Center, Erasmus Universiteit Rotterdam, Rotterdam, The Netherlands.
⁵ Board of Directors, University Medical Center, University Medical Centre Groningen, Groningen, The Netherlands.
⁶ Orthopaedic Surgery, University Medical Centre Groningen, Groningen, The Netherlands.
⁷ Universitair Medisch Centrum Groningen, Groningen, The Netherlands.
⁸ Department of Pediatrics, University Medical Centre Groningen, Groningen, The Netherlands.

PMID: 40908007
PMCID: PMC12414186
DOI: 10.1136/bmjopen-2025-099301

Abstract

Objectives: To compare the quality and time efficiency of physician-written summaries with customised large language model (LLM)-generated medical summaries integrated into the electronic health record (EHR) in a non-English clinical environment.

Design: Cross-sectional non-inferiority validation study.

Setting: Tertiary academic hospital.

Participants: 52 physicians from 8 specialties at a large Dutch academic hospital participated, either in writing summaries (n=42) or evaluating them (n=10).

Interventions: Physician writers wrote summaries of 50 patient records. LLM-generated summaries were created for the same records using an EHR-integrated LLM. An independent, blinded panel of physician evaluators compared physician-written summaries to LLM-generated summaries.

Primary and secondary outcome measures: Primary outcome measures were completeness, correctness and conciseness (on a 5-point Likert scale). Secondary outcomes were preference and trust, and time to generate either the physician-written or LLM-generated summary.

Results: The completeness and correctness of LLM-generated summaries did not differ significantly from physician-written summaries. However, LLM summaries were less concise (3.0 vs 3.5, p=0.001). Overall evaluation scores were similar (3.4 vs 3.3, p=0.373), with 57% of evaluators preferring LLM-generated summaries. Trust in both summary types was comparable, and interobserver variability showed excellent reliability (intraclass correlation coefficient 0.975). Physicians took an average of 7 min per summary, while LLMs completed the same task in just 15.7 s.

Conclusions: LLM-generated summaries are comparable to physician-written summaries in completeness and correctness, although slightly less concise. With a clear time-saving benefit, LLMs could help reduce clinicians' administrative burden without compromising summary quality.

Keywords: Artificial Intelligence; Electronic Health Records; Physicians.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

**Figure 1. The non-inferiority study design. Online supplemental material includes further explanation on the numbers. LLM, large language model.**

Figure 2. Example of summaries (left panel) and their corresponding evaluation by the physician evaluators (right panel). In red: mistake by physician, in green: additional valuable information in physician summary. Translated from Dutch to English for illustration purposes: original Dutch text available (online supplemental material). AI, artificial intelligence.

**Figure 3. Recognition, preference and trust infographic. AI, artificial intelligence.**

See this image and copyright information in PMC

References

1. Raza MM, Venkatesh KP, Kvedar JC. Generative AI and large language models in health care: pathways to implementation. NPJ Digit Med. 2024;7:62.:62. doi: 10.1038/s41746-023-00988-4. - DOI - PMC - PubMed
1. Yu P, Xu H, Hu X, et al. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare (Basel) 2023;11:2776. doi: 10.3390/healthcare11202776. - DOI - PMC - PubMed
1. Van Veen D, Van Uden C, Blankemeier L, et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med. 2024;30:1134–42. doi: 10.1038/s41591-024-02855-5. - DOI - PMC - PubMed
1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. Large language models in medicine. Nat Med. 2023;29:1930–40. doi: 10.1038/s41591-023-02448-8. - DOI - PubMed
1. OpenAI GPT-4 technical report. 2023. [25-Jul-2025]. https://cdn.openai.com/papers/gpt-4.pdf Available. Accessed.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- HighWire
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study

Collaborators

Affiliations

Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources