Comparative Performance Evaluation of Large Language Models and Human Teachers in Answering Optometry Questions from Medical Undergraduates
- PMID: 41487325
- PMCID: PMC12759117
- DOI: 10.1177/23821205251409499
Comparative Performance Evaluation of Large Language Models and Human Teachers in Answering Optometry Questions from Medical Undergraduates
Abstract
Purpose: We aim to evaluate the performance of 5 large language models (LLMs) and human teachers in answering optometry-related questions raised by medical undergraduate students.
Methods: This prospective and comparative study collected 108 questions from 30 students. The questions were sent to their teachers for responses and were also inputted into 5 LLMs, including 2 local models (Mistral-7B and Llama-2-13B) and 3 online models (Claude-3, Gemini-1.0 pro, and GPT-4.0), to generate corresponding answers. All answers were independently evaluated by 2 optometry experts in a blind manner for accuracy, completeness, comprehensibility, and overall quality, using a 5-point scale. Students were asked to complete a 6-item questionnaire about their satisfaction and perspectives on the integration of LLMs.
Results: LLMs responded more quickly and generated more extensive answers compared to humans (P < .001). In terms of overall performance, human teachers ranked fifth among the 6 participants, with scores significantly lower than GPT-4.0 (P < .001), Claude-3 (P < .001), and Gemini-1.0 pro (P < .001). GPT-4.0 received the highest scores for accuracy (3.87/5) and completeness (4.11/5), while Claude-3 excelled in comprehensibility (3.91/5) and overall quality (3.93/5); however, the differences between them were not statistically significant. Online LLMs outperformed both humans and locally deployed LLMs (P < .001). Students agreed that LLMs provided more comprehensive and detailed information (3.80/5), but found human answers easier to understand (4.17/5). They were less supportive of replacing teachers with LLMs for answering questions (2.93/5).
Conclusion: Our findings demonstrate the potential of LLMs to serve as valuable tools in optometry education, particularly in addressing students' real-world questions.
Keywords: answering questions; artificial intelligence; large language models; medical education; optometry.
© The Author(s) 2026.
Conflict of interest statement
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures
References
LinkOut - more resources
Full Text Sources
