Comparative Study

. 2024 Sep;262(9):2945-2959.

doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison

Matteo Mario Carlà^{1

2}, Gloria Gambini^{3

4}, Antonio Baldascino^{3

4}, Francesco Boselli^{3

4}, Federico Giannuzzi^{3

4}, Fabio Margollicci^{3

4}, Stanislao Rizzo^{3

4}

Affiliations

¹ Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy. mm.carla94@gmail.com.
² Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy. mm.carla94@gmail.com.
³ Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy.
⁴ Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy.

PMID: 38573349
PMCID: PMC11377518
DOI: 10.1007/s00417-024-06470-5

Comparative Study

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison

Matteo Mario Carlà et al. Graefes Arch Clin Exp Ophthalmol. 2024 Sep.

. 2024 Sep;262(9):2945-2959.

doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.

Authors

Matteo Mario Carlà^{1

2}, Gloria Gambini^{3

4}, Antonio Baldascino^{3

4}, Francesco Boselli^{3

4}, Federico Giannuzzi^{3

4}, Fabio Margollicci^{3

4}, Stanislao Rizzo^{3

4}

Affiliations

¹ Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy. mm.carla94@gmail.com.
² Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy. mm.carla94@gmail.com.
³ Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy.
⁴ Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy.

PMID: 38573349
PMCID: PMC11377518
DOI: 10.1007/s00417-024-06470-5

Abstract

Purpose: The aim of this study was to define the capability of ChatGPT-4 and Google Gemini in analyzing detailed glaucoma case descriptions and suggesting an accurate surgical plan.

Methods: Retrospective analysis of 60 medical records of surgical glaucoma was divided into "ordinary" (n = 40) and "challenging" (n = 20) scenarios. Case descriptions were entered into ChatGPT and Bard's interfaces with the question "What kind of surgery would you perform?" and repeated three times to analyze the answers' consistency. After collecting the answers, we assessed the level of agreement with the unified opinion of three glaucoma surgeons. Moreover, we graded the quality of the responses with scores from 1 (poor quality) to 5 (excellent quality), according to the Global Quality Score (GQS) and compared the results.

Results: ChatGPT surgical choice was consistent with those of glaucoma specialists in 35/60 cases (58%), compared to 19/60 (32%) of Gemini (p = 0.0001). Gemini was not able to complete the task in 16 cases (27%). Trabeculectomy was the most frequent choice for both chatbots (53% and 50% for ChatGPT and Gemini, respectively). In "challenging" cases, ChatGPT agreed with specialists in 9/20 choices (45%), outperforming Google Gemini performances (4/20, 20%). Overall, GQS scores were 3.5 ± 1.2 and 2.1 ± 1.5 for ChatGPT and Gemini (p = 0.002). This difference was even more marked if focusing only on "challenging" cases (1.5 ± 1.4 vs. 3.0 ± 1.5, p = 0.001).

Conclusion: ChatGPT-4 showed a good analysis performance for glaucoma surgical cases, either ordinary or challenging. On the other side, Google Gemini showed strong limitations in this setting, presenting high rates of unprecise or missed answers.

Keywords: Artificial intelligence (AI); ChatGPT; Glaucoma; Glaucoma surgery; Google Bard; Google Gemini; Large language models (LLM).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Screenshot of ChatGPT-4 responses in a “challenging” case. A, B Case description and ChatGPT answer analyzing the scenario and proposing several surgical treatments; C coherent answer when asked to choose only one among proposed treatments

**Fig. 2**
Screenshot of Google Gemini responses in the same “challenging” case of Fig. 1. A Case description and Gemini analysis of the case; B when asked for surgical advice, Google Gemini provided more synthetic answers rich of web sources; C when asked to choose only one treatment, Gemini frequently answered “I can’t choose one treatment for this case.” However, it was able to present a list of surgical options, even though none of them was analyzed in details

**Fig. 3**
Histograms showing A the level of agreement between ChatGPT and Google Gemini’s answers and those provided by glaucoma specialists in all cases and in “ordinary” and “challenging” scenarios. Complete agreement was assessed when the final choice of the chatbot was consistent with the one provided by specialists, while partial agreement included cases in which the correct answer was listed but not picked as preferred choice by the chatbot; B the comparison between the Global Quality Scores assigned by ophthalmologists to the two chatbots’ performance and usability (showed as mean and standard deviation). One asterisk (*) stands for statistical difference < 0.05; two asterisks (**) stand for p < 0.01

See this image and copyright information in PMC

References

1. Ozdemir S (2023) Quick start guide to large language models: strategies and best practices for using ChatGPT and other LLMs. Addison-Wesley Professional
1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29:1930–1940 10.1038/s41591-023-02448-8 - DOI - PubMed
1. Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Scharli N, Chowdhery A, Mansfield P, Demner-Fushman D, Aguera YAB, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V (2023) Large language models encode clinical knowledge. Nature 620:172–180. 10.1038/s41586-023-06291-2 10.1038/s41586-023-06291-2 - DOI - PMC - PubMed
1. Nath S, Marie A, Ellershaw S, Korot E, Keane PA (2022) New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology. Br J Ophthalmol 106:889–892. 10.1136/bjophthalmol-2022-321141 10.1136/bjophthalmol-2022-321141 - DOI - PubMed
1. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepano C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2:e0000198. 10.1371/journal.pdig.0000198 10.1371/journal.pdig.0000198 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison

Affiliations

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical