Comparative Performance Evaluation of Large Language Models and Human Teachers in Answering Optometry Questions from Medical Undergraduates

Zijing Huang¹, Tian Lin^{1

2}, Huini Lin¹, Yuanjin Zheng¹, Man Pan Chin³, Hongxi Wang¹, Peigeng Xu¹, Haoyu Chen¹

Affiliations

¹ Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, China.
² Fifth Institute of Shantou University Medical College, Shantou, China.
³ School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.

PMID: 41487325
PMCID: PMC12759117
DOI: 10.1177/23821205251409499

Comparative Performance Evaluation of Large Language Models and Human Teachers in Answering Optometry Questions from Medical Undergraduates

Zijing Huang et al. J Med Educ Curric Dev. 2026.

. 2026 Jan 2:13:23821205251409499.

doi: 10.1177/23821205251409499. eCollection 2026 Jan-Dec.

Authors

Zijing Huang¹, Tian Lin^{1

2}, Huini Lin¹, Yuanjin Zheng¹, Man Pan Chin³, Hongxi Wang¹, Peigeng Xu¹, Haoyu Chen¹

Affiliations

¹ Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, China.
² Fifth Institute of Shantou University Medical College, Shantou, China.
³ School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.

PMID: 41487325
PMCID: PMC12759117
DOI: 10.1177/23821205251409499

Abstract

Purpose: We aim to evaluate the performance of 5 large language models (LLMs) and human teachers in answering optometry-related questions raised by medical undergraduate students.

Methods: This prospective and comparative study collected 108 questions from 30 students. The questions were sent to their teachers for responses and were also inputted into 5 LLMs, including 2 local models (Mistral-7B and Llama-2-13B) and 3 online models (Claude-3, Gemini-1.0 pro, and GPT-4.0), to generate corresponding answers. All answers were independently evaluated by 2 optometry experts in a blind manner for accuracy, completeness, comprehensibility, and overall quality, using a 5-point scale. Students were asked to complete a 6-item questionnaire about their satisfaction and perspectives on the integration of LLMs.

Results: LLMs responded more quickly and generated more extensive answers compared to humans (P < .001). In terms of overall performance, human teachers ranked fifth among the 6 participants, with scores significantly lower than GPT-4.0 (P < .001), Claude-3 (P < .001), and Gemini-1.0 pro (P < .001). GPT-4.0 received the highest scores for accuracy (3.87/5) and completeness (4.11/5), while Claude-3 excelled in comprehensibility (3.91/5) and overall quality (3.93/5); however, the differences between them were not statistically significant. Online LLMs outperformed both humans and locally deployed LLMs (P < .001). Students agreed that LLMs provided more comprehensive and detailed information (3.80/5), but found human answers easier to understand (4.17/5). They were less supportive of replacing teachers with LLMs for answering questions (2.93/5).

Conclusion: Our findings demonstrate the potential of LLMs to serve as valuable tools in optometry education, particularly in addressing students' real-world questions.

Keywords: answering questions; artificial intelligence; large language models; medical education; optometry.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 2.**
Answering performance of large language models (LLMs) and human teachers. (A-D) Optometry experts’ rating for accuracy (A), completeness (B), comprehensibility (C), and overall assessment (D), on the answers provided by the human teachers and 5 LLMs. Data were presented as mean ± standard deviation and were analyzed using the least-significant difference (LSD) post-hoc test for pairwise multiple comparisons. The differences between the various LLMs were not marked in the figure.

**Figure 3.**
Time spent answering questions (A) and word count of the answers (B) from human teachers and 5 LLMs. Data were presented as mean ± standard deviation and were analyzed using the least-significant difference (LSD) post-hoc test for pairwise multiple comparisons. The differences between the various LLMs were not marked in the figure.

**Figure 4.**
A 6-item questionnaire for satisfaction and comments on the integration of large language models in the optometry course. The scoring results were shown in the right panel, presented as mean ± SD.

See this image and copyright information in PMC

References

1. Holden BA, Fricke TR, Wilson DA, et al. Global prevalence of myopia and high myopia and temporal trends from 2000 through 2050. Ophthalmology. 2016;123(5):1036–1042. doi: 10.1016/j.ophtha.2016.01.006 - DOI - PubMed
1. The role of optometry in Vision 2020. Community Eye Health. 2002;15(43):33–36. - PMC - PubMed
1. Gammoh Y, Morjaria P, Block SS, Massie J, Hendicott P. 2023 Global survey of optometry: defining variations of practice, regulation and human resources between countries. Clin Optom (Auckl). 2024;16:211–220. doi: 10.2147/OPTO.S481096 - DOI - PMC - PubMed
1. Huang Z, Yang J, Wang H, Pang CP, Chen H. Comparison of digital camera real-time display with conventional teaching tube for slit lamp microscopy teaching. Curr Eye Res. 2022;47(1):161–164. doi: 10.1080/02713683.2021.1952606 - DOI - PubMed
1. Wang H, Liao X, Zhang M, Pang CP, Chen H. Smartphone ophthalmoscope as a tool in teaching direct ophthalmoscopy: a crossover randomized controlled trial. Med Educ Online. 2023;28(1):2176201. doi: 10.1080/10872981.2023.2176201 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparative Performance Evaluation of Large Language Models and Human Teachers in Answering Optometry Questions from Medical Undergraduates

Affiliations

Comparative Performance Evaluation of Large Language Models and Human Teachers in Answering Optometry Questions from Medical Undergraduates

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources