Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
- PMID: 40486905
- PMCID: PMC12141289
- DOI: 10.3389/fcell.2025.1600202
Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
Abstract
Background: The development of medical artificial intelligence (AI) models is primarily driven by the need to address healthcare resource scarcity, particularly in underserved regions. Proposing an affordable, accessible, interpretable, and automated AI system for non-clinical settings is crucial to expanding access to quality healthcare.
Methods: This cross-sectional study developed the Multimodal Ocular Surface Assessment and Interpretation Copilot (MOSAIC) using three multimodal large language models: gpt-4-turbo, claude-3-opus, and gemini-1.5-pro-latest, for detecting three ocular surface diseases (OSDs) and grading keratitis and pterygium. A total of 375 smartphone-captured ocular surface images collected from 290 eyes were utilized to validate MOSAIC. The performance of MOSAIC was evaluated in both zero-shot and few-shot settings, with tasks including image quality control, OSD detection, analysis of the severity of keratitis, and pterygium grading. The interpretability of the system was also evaluated.
Results: MOSAIC achieved 95.00% accuracy in image quality control, 86.96% in OSD detection, 88.33% in distinguishing mild from severe keratitis, and 66.67% in determining pterygium grades with five-shot settings. The performance significantly improved with the increasing learning shots (p < 0.01). The system attained high ROUGE-L F1 scores of 0.70-0.78, depicting its interpretable image comprehension capability.
Conclusion: MOSAIC exhibited exceptional few-shot learning capabilities, achieving high accuracy in OSD management with minimal training examples. This system has significant potential for smartphone integration to enhance the accessibility and effectiveness of OSD detection and grading in resource-limited settings.
Keywords: conjunctivitis; keratitis; large language model; multimodal model; ocular surface disease; pterygium.
Copyright © 2025 Li, Wang, Xiu, Zhang, Wang, Wang, Chen, Yang and Chen.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures






Similar articles
-
Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning.Ther Adv Ophthalmol. 2025 May 20;17:25158414251340569. doi: 10.1177/25158414251340569. eCollection 2025 Jan-Dec. Ther Adv Ophthalmol. 2025. PMID: 40400723 Free PMC article.
-
Enhancing responses from large language models with role-playing prompts: a comparative study on answering frequently asked questions about total knee arthroplasty.BMC Med Inform Decis Mak. 2025 May 23;25(1):196. doi: 10.1186/s12911-025-03024-5. BMC Med Inform Decis Mak. 2025. PMID: 40410756 Free PMC article.
-
Large Language Models for Simplified Interventional Radiology Reports: A Comparative Analysis.Acad Radiol. 2025 Feb;32(2):888-898. doi: 10.1016/j.acra.2024.09.041. Epub 2024 Sep 30. Acad Radiol. 2025. PMID: 39353826
-
Integrating artificial intelligence with smartphone-based imaging for cancer detection in vivo.Biosens Bioelectron. 2025 Mar 1;271:116982. doi: 10.1016/j.bios.2024.116982. Epub 2024 Nov 21. Biosens Bioelectron. 2025. PMID: 39616900 Review.
-
Role of artificial intelligence, machine learning and deep learning models in corneal disorders - A narrative review.J Fr Ophtalmol. 2024 Sep;47(7):104242. doi: 10.1016/j.jfo.2024.104242. Epub 2024 Jul 15. J Fr Ophtalmol. 2024. PMID: 39013268 Review.
Cited by
-
Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025. Front Cell Dev Biol. 2025. PMID: 40772224 Free PMC article.
References
-
- Brown T. B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., et al. (2020). “Language models are few-shot learners,”, arXiv: arXiv:2005.14165. 10.48550/arXiv.2005.14165 - DOI
-
- Burton M. J. (2009). Prevention, treatment and rehabilitation. Community Eye Health 22 (71), 33–35. Available online at: https://pmc.ncbi.nlm.nih.gov/articles/PMC2823104/ . - PMC - PubMed
LinkOut - more resources
Full Text Sources