Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2025 May 1;143(5):410-419.
doi: 10.1001/jamaophthalmol.2025.0351.

Evaluation of AI Summaries on Interdisciplinary Understanding of Ophthalmology Notes

Affiliations
Randomized Controlled Trial

Evaluation of AI Summaries on Interdisciplinary Understanding of Ophthalmology Notes

Prashant D Tailor et al. JAMA Ophthalmol. .

Abstract

Importance: Specialized ophthalmology terminology limits comprehension for nonophthalmology clinicians and professionals, hindering interdisciplinary communication and patient care. The clinical implementation of large language models (LLMs) into practice has to date been relatively unexplored.

Objective: To evaluate LLM-generated plain language summaries (PLSs) integrated into standard ophthalmology notes (SONs) in improving diagnostic understanding, satisfaction, and clarity.

Design, setting, and participants: Randomized quality improvement study conducted from February 1, 2024, to May 31, 2024, including data from inpatient and outpatient encounters in a single tertiary academic center. Participants were nonophthalmology clinicians and professionals and ophthalmologists. The single inclusion criterion was any encounter note generated by an ophthalmologist during the study dates. Exclusion criteria were (1) lack of established nonophthalmology clinicians and professionals for outpatient encounters and (2) procedure-only patient encounters.

Intervention: Addition of LLM-generated plain language summaries to ophthalmology notes.

Main outcomes and measures: The primary outcome was survey responses from nonophthalmology clinicians and professionals assessing understanding, satisfaction, and clarity of ophthalmology notes. Secondary outcomes were survey responses from ophthalmologists evaluating PLS in terms of clinical workflow and accuracy, objective measures of semantic quality, and safety analysis.

Results: A total of 362 (85%) nonophthalmology clinicians and professionals (33.0% response rate) preferred the PLS to SON. Demographic data on age, race and ethnicity, and sex were not collected. Nonophthalmology clinicians and professionals reported enhanced diagnostic understanding (percentage point increase, 9.0; 95% CI, 0.3-18.2; P = .01), increased note detail satisfaction (percentage point increase, 21.5; 95% CI, 11.4-31.5; P < .001), and improved explanation clarity (percentage point increase, 23.0; 95% CI, 12.0-33.1; P < .001) for notes containing a PLS. The addition of a PLS was associated with reduced comprehension gaps between clinicians who were comfortable and uncomfortable with ophthalmology terminology (from 26.1% [95% CI, 13.7%-38.6%; P < .001] to 14.4% [95% CI, 4.3%-24.6%; P > .06]). PLS semantic analysis found high meaning preservation (bidirectional encoder representations from transformers score mean F1 score: 0.85) with greater readability than SONs (Flesch Reading Ease: 51.8 vs 43.6; Flesch-Kincaid Grade Level: 10.7 vs 11.9). Ophthalmologists (n = 489; 84% response rate) reported high PLS accuracy (90% [320 of 355] a great deal) with minimal review time burden (94.9% [464 of 489] ≤1 minute). PLS error rate on ophthalmologist review was 26% (126 of 489). A total of 83.9% (104 of 126) of errors were deemed low risk for harm and none had a risk of severe harm or death.

Conclusions and relevance: In this study, use of LLM-generated PLSs was associated with enhanced comprehension and satisfaction among nonophthalmology clinicians and professionals, which might aid interdisciplinary communication. Careful implementation and safety monitoring are recommended for clinical integration given the persistence of errors despite physician review.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Iezzi reported receiving personal fees from Johnson and Johnson as consultant outside the submitted work. Dr Sit reported receiving personal fees from Globe Biomedical, Inc. and Injectsense Inc, grants from Nicox Ophthalmics Inc, personal fees from PolyActiva, Pty, Qlaris Bio Inc, and Santen Pharmaceuticals Asia, Pty outside the submitted work. Dr Starr reported receiving nonfinancial support from AbbVie, Gyroscope Therapeutics, Evolve Medical, and Alimera Sciences outside the submitted work. No other disclosures were reported.

Comment on

References

    1. Ayers JW, Poliak A, Dredze M, et al. . Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589-596. doi:10.1001/jamainternmed.2023.1838 - DOI - PMC - PubMed
    1. Zaretsky J, Kim JM, Baskharoun S, et al. . Generative artificial intelligence to transform inpatient discharge summaries to patient-friendly language and format. JAMA Netw Open. 2024;7(3):e240357. doi:10.1001/jamanetworkopen.2024.0357 - DOI - PMC - PubMed
    1. Van Veen D, Van Uden C, Blankemeier L, et al. . Adapted large language models can outperform medical experts in clinical text summarization. Nat Med. 2024;30(4):1134-1142. doi:10.1038/s41591-024-02855-5 - DOI - PMC - PubMed
    1. Hake J, Crowley M, Coy A, et al. . Quality, Accuracy, and Bias in ChatGPT-Based Summarization of Medical Abstracts. Ann Fam Med. 2024;22(2):113-120. doi:10.1370/afm.3075 - DOI - PMC - PubMed
    1. Tailor PD, Dalvin LA, Chen JJ, et al. . A comparative study of responses to retina questions from either experts, expert-edited large language models, or expert-edited large language models alone. Ophthalmol Sci. 2024;4(4):100485. doi:10.1016/j.xops.2024.100485 - DOI - PMC - PubMed

LinkOut - more resources