Analysis of ChatGPT Responses to Ophthalmic Cases: Can ChatGPT Think like an Ophthalmologist?
- PMID: 39346575
- PMCID: PMC11437840
- DOI: 10.1016/j.xops.2024.100600
Analysis of ChatGPT Responses to Ophthalmic Cases: Can ChatGPT Think like an Ophthalmologist?
Abstract
Objective: Large language models such as ChatGPT have demonstrated significant potential in question-answering within ophthalmology, but there is a paucity of literature evaluating its ability to generate clinical assessments and discussions. The objectives of this study were to (1) assess the accuracy of assessment and plans generated by ChatGPT and (2) evaluate ophthalmologists' abilities to distinguish between responses generated by clinicians versus ChatGPT.
Design: Cross-sectional mixed-methods study.
Subjects: Sixteen ophthalmologists from a single academic center, of which 10 were board-eligible and 6 were board-certified, were recruited to participate in this study.
Methods: Prompt engineering was used to ensure ChatGPT output discussions in the style of the ophthalmologist author of the Medical College of Wisconsin Ophthalmic Case Studies. Cases where ChatGPT accurately identified the primary diagnoses were included and then paired. Masked human-generated and ChatGPT-generated discussions were sent to participating ophthalmologists to identify the author of the discussions. Response confidence was assessed using a 5-point Likert scale score, and subjective feedback was manually reviewed.
Main outcome measures: Accuracy of ophthalmologist identification of discussion author, as well as subjective perceptions of human-generated versus ChatGPT-generated discussions.
Results: Overall, ChatGPT correctly identified the primary diagnosis in 15 of 17 (88.2%) cases. Two cases were excluded from the paired comparison due to hallucinations or fabrications of nonuser-provided data. Ophthalmologists correctly identified the author in 77.9% ± 26.6% of the 13 included cases, with a mean Likert scale confidence rating of 3.6 ± 1.0. No significant differences in performance or confidence were found between board-certified and board-eligible ophthalmologists. Subjectively, ophthalmologists found that discussions written by ChatGPT tended to have more generic responses, irrelevant information, hallucinated more frequently, and had distinct syntactic patterns (all P < 0.01).
Conclusions: Large language models have the potential to synthesize clinical data and generate ophthalmic discussions. While these findings have exciting implications for artificial intelligence-assisted health care delivery, more rigorous real-world evaluation of these models is necessary before clinical deployment.
Financial disclosures: The author(s) have no proprietary or commercial interest in any materials discussed in this article.
Keywords: Artificial Intelligence; ChatGPT; Large language models; Medical Education; Ophthalmology.
© 2024 by the American Academy of Ophthalmology.
Figures
Similar articles
-
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320. JAMA Netw Open. 2023. PMID: 37606922 Free PMC article.
-
ChatGPT Assisting Diagnosis of Neuro-ophthalmology Diseases Based on Case Reports.medRxiv [Preprint]. 2023 Sep 14:2023.09.13.23295508. doi: 10.1101/2023.09.13.23295508. medRxiv. 2023. Update in: J Neuroophthalmol. 2024 Oct 10. doi: 10.1097/WNO.0000000000002274. PMID: 37781591 Free PMC article. Updated. Preprint.
-
Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep. Cureus. 2024. PMID: 39421095 Free PMC article.
-
Utility of artificial intelligence-based large language models in ophthalmic care.Ophthalmic Physiol Opt. 2024 May;44(3):641-671. doi: 10.1111/opo.13284. Epub 2024 Feb 25. Ophthalmic Physiol Opt. 2024. PMID: 38404172 Review.
-
ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology.Eye (Lond). 2024 May;38(7):1252-1261. doi: 10.1038/s41433-023-02915-z. Epub 2024 Jan 3. Eye (Lond). 2024. PMID: 38172581 Free PMC article. Review.
Cited by
-
Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis.J Pain Res. 2025 Mar 19;18:1387-1405. doi: 10.2147/JPR.S509845. eCollection 2025. J Pain Res. 2025. PMID: 40124539 Free PMC article.
-
Can off-the-shelf visual large language models detect and diagnose ocular diseases from retinal photographs?BMJ Open Ophthalmol. 2025 Apr 7;10(1):e002076. doi: 10.1136/bmjophth-2024-002076. BMJ Open Ophthalmol. 2025. PMID: 40194867 Free PMC article.
-
A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis.Sci Rep. 2024 Dec 5;14(1):30385. doi: 10.1038/s41598-024-80917-x. Sci Rep. 2024. PMID: 39639068 Free PMC article.
-
Triage of Patient Messages Sent to the Eye Clinic via the Electronic Medical Record: A Comparative Study on AI and Human Triage Performance.J Clin Med. 2025 Mar 31;14(7):2395. doi: 10.3390/jcm14072395. J Clin Med. 2025. PMID: 40217845 Free PMC article.
-
Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.Narra J. 2025 Apr;5(1):e2371. doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8. Narra J. 2025. PMID: 40352182 Free PMC article.
References
-
- Gulshan V., Peng L., Coram M., et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402–2410. - PubMed
-
- Applications of natural language processing in ophthalmology: present and future - PMC. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9393550/ - PMC - PubMed