Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 3;44(6):788-794.
doi: 10.1097/ICO.0000000000003747.

Use of Online Large Language Model Chatbots in Cornea Clinics

Affiliations

Use of Online Large Language Model Chatbots in Cornea Clinics

Prem A H Nichani et al. Cornea. .

Abstract

Purpose: Online large language model (LLM) chatbots have garnered attention for their potential in enhancing efficiency, providing education, and advancing research. This study evaluated the performance of LLM chatbots-Chat Generative Pre-Trained Transformer (ChatGPT), Writesonic, Google Bard, and Bing Chat-in responding to cornea-related scenarios.

Methods: Prompts covering clinic administration, patient counselling, treatment algorithms, surgical management, and research were devised. Responses from LLMs were assessed by 3 fellowship-trained cornea specialists, blinded to the LLM used, using a standardized rubric evaluating accuracy, comprehension, compassion, professionalism, humanness, comprehensiveness, and overall quality. In addition, 12 readability metrics were used to further evaluate responses. Scores were averaged and ranked; subgroup analyses were performed to identify the best-performing LLM for each rubric criterion.

Results: Sixty-six responses were generated from 11 prompts. ChatGPT outperformed the other LLMs across all rubric criteria, scoring an overall response score of 3.35 ± 0.42 (83.8%). However, Google Bard excelled in readability, leading in 75% of the metrics assessed. Importantly, no responses were found to pose risks to patients, ensuring the safety and reliability of the information provided.

Conclusions: ChatGPT demonstrated superior accuracy and comprehensiveness in responding to cornea-related prompts, whereas Google Bard stood out for its readability. The study highlights the potential of LLMs in streamlining various clinical, administrative, and research tasks in ophthalmology. Future research should incorporate patient feedback and ongoing data collection to monitor LLM performance over time. Despite their promise, LLMs should be used with caution, necessitating continuous oversight by medical professionals and standardized evaluations to ensure patient safety and maximize benefits.

Keywords: artificial intelligence; clinic efficiency; cornea; large language models; patient counselling.

PubMed Disclaimer

Conflict of interest statement

S. Ong Tone: Labtician (research grant/financial support), Rx Renewal (consultant/consulting fees), Sun Pharma (consultant/consulting fees). J. C. Teichman: Aequus (consultant/consulting fees), Alcon (consultant/consulting fees, research grant/financial support), Allergan (consultant/consulting fees), Bausch & Lomb (consultant/consulting fees, research grant/financial support), Labtician-Théa (consultant/consulting fees), Novartis (consultant/consulting fees), Santen (consultant/consulting fees), Shire (consultant/consulting fees), Sun Pharma (consultant/consulting fees). C. C. Chan: Abbvie (honoraria, research grant/financial support), Aurion (consultant/consulting fees), Bausch & Lomb (honoraria, research grant/financial support), Corneat (research grant/financial support), Johnson & Johnson (honoraria, research grant/financial support), Labtician (honoraria, research grant/financial support), Thea (honoraria, research grant/financial support), Santen (honoraria, research grant/financial support), Shire (honoraria, research grant/financial support), Sun Pharma (consultant/consulting fees), Zeiss (honoraria). The remaining authors have no funding or conflicts of interest to disclose.

References

    1. Raimondi R, Tzoumas N, Salisbury T, et al. Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams. Eye (Lond). 2023;37:3530–3533.
    1. Tailor PD, Xu TT, Fortes BH, et al. Appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model. Mayo Clin Proc Digit Health. 2024;2:119–128.
    1. Delsoz M, Madadi Y, Raja H, et al. Performance of ChatGPT in diagnosis of corneal eye diseases. Cornea. 2024;43:664–670.
    1. Berkowitz ST, Finn AP, Parikh R, et al. Ophthalmology workforce projections in the United States, 2020 to 2035. Ophthalmology. 2024;131:133–139.
    1. Edmunds MR, Barry RJ, Denniston AK. Readability assessment of online ophthalmic patient information. JAMA Ophthalmol. 2013;131:1610–1616.

LinkOut - more resources