Evaluation and comparison of large language models' responses to questions related optic neuritis

Han-Jie He^#^{1

2}, Fang-Fang Zhao^#¹, Jia-Jian Liang¹, Yun Wang¹, Qian-Qian He^{1

2}, Hongjie Lin¹, Jingyun Cen³, Feifei Chen¹, Tai-Ping Li¹, Zhanchi Hu⁴, Jian-Feng Yang¹, Lan Chen¹, Carol Y Cheung⁵, Yih-Chung Tham^{6

7

8}, Ling-Ping Cen^{1

2

4

9}

Affiliations

¹ Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
² Shantou University Medical College, Shantou, Guangdong, China.
³ Shaoguan University Medical College, Shaoguan, Guangdong, China.
⁴ Dongguan Guangming Eye Hospital, Dongguan, Guangdong, China.
⁵ Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China.
⁶ Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
⁷ Centre of Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore, Singapore.
⁸ Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore.
⁹ Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, The First Dongguan Affiliated Hospital, School of Medical Technology, Guangdong Medical University, Dongguan, China.

^# Contributed equally.

PMID: 40636386
PMCID: PMC12238082
DOI: 10.3389/fmed.2025.1516442

Evaluation and comparison of large language models' responses to questions related optic neuritis

Han-Jie He et al. Front Med (Lausanne). 2025.

. 2025 Jun 25:12:1516442.

doi: 10.3389/fmed.2025.1516442. eCollection 2025.

Authors

Affiliations

¹ Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
² Shantou University Medical College, Shantou, Guangdong, China.
³ Shaoguan University Medical College, Shaoguan, Guangdong, China.
⁴ Dongguan Guangming Eye Hospital, Dongguan, Guangdong, China.
⁵ Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China.
⁶ Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
⁷ Centre of Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore, Singapore.
⁸ Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore.
⁹ Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, The First Dongguan Affiliated Hospital, School of Medical Technology, Guangdong Medical University, Dongguan, China.

^# Contributed equally.

PMID: 40636386
PMCID: PMC12238082
DOI: 10.3389/fmed.2025.1516442

Abstract

Objectives: Large language models (LLMs) show promise as clinical consultation tools and may assist optic neuritis patients, though research on their performance in this area is limited. Our study aims to assess and compare the performance of four commonly used LLM-Chatbots-Claude-2, ChatGPT-3.5, ChatGPT-4.0, and Google Bard-in addressing questions related to optic neuritis.

Methods: We curated 24 optic neuritis-related questions and had three ophthalmologists rate the responses on two three-point scales for accuracy and comprehensiveness. We also assessed readability using four scales. The final results showed performance differences among the four LLM-Chatbots.

Results: The average total accuracy scores (out of 9): ChatGPT-4.0 (7.62 ± 0.86), Google Bard (7.42 ± 1.20), ChatGPT-3.5 (7.21 ± 0.70), Claude-2 (6.44 ± 1.07). ChatGPT-4.0 (p = 0.0006) and Google Bard (p = 0.0015) were significantly more accurate than Claude-2. Also, 62.5% of ChatGPT-4.0's responses were rated "Excellent," followed by 58.3% for Google Bard, both higher than Claude-2's 29.2% (all p ≤ 0.042) and ChatGPT-3.5's 41.7%. Both Claude-2 and Google Bard had 8.3% "Deficient" responses. The comprehensiveness scores were similar among the four LLMs (p = 0.1531). Note that all responses require at least a university-level reading proficiency.

Conclusion: Large language models-Chatbots hold immense potential as clinical consultation tools for optic neuritis, but they require further refinement and proper evaluation strategies before deployment to ensure reliable and accurate performance.

Keywords: artificial intelligence; eye diseases; natural language processing; optic nerve diseases; optic neuritis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Average total readability scores of responses generated by large language models (LLM)-Chatbots and official website content. *P ≤ 0.05.

**FIGURE 2**
Average total accuracy scores of responses generated by large language models (LLM)-Chatbots. **P ≤ 0.01; ***P ≤ 0.001.

**FIGURE 3**
Final rating of responses generated by large language models (LLM)-Chatbots determined by the majority rule.

See this image and copyright information in PMC

References

1. Li Z, Wang L, Wu X, Jiang J, Qiang W, Xie H, et al. Artificial intelligence in ophthalmology: The path to the real-world clinic. Cell Rep Med. (2023) 4:101095. 10.1016/j.xcrm.2023.101095 - DOI - PMC - PubMed
1. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. (2019) 25:24–9. 10.1038/s41591-018-0316-z - DOI - PubMed
1. OpenAI. Introducing ChatGPT. (2024). Available online at: https://openai.com/blog/chatgpt (accessed April 16, 2024).
1. Tan S, Xin X, Wu D. ChatGPT in medicine: Prospects and challenges: A review article. Int J Surg. (2024) 110:3701–6. 10.1097/JS9.0000000000001312 - DOI - PMC - PubMed
1. Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: An analysis of its successes and shortcomings. Ophthalmol Sci. (2023) 3:100324. 10.1016/j.xops.2023.100324 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluation and comparison of large language models' responses to questions related optic neuritis

Affiliations

Evaluation and comparison of large language models' responses to questions related optic neuritis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources