Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 23:13:1600202.
doi: 10.3389/fcell.2025.1600202. eCollection 2025.

Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images

Affiliations

Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images

Zhongwen Li et al. Front Cell Dev Biol. .

Abstract

Background: The development of medical artificial intelligence (AI) models is primarily driven by the need to address healthcare resource scarcity, particularly in underserved regions. Proposing an affordable, accessible, interpretable, and automated AI system for non-clinical settings is crucial to expanding access to quality healthcare.

Methods: This cross-sectional study developed the Multimodal Ocular Surface Assessment and Interpretation Copilot (MOSAIC) using three multimodal large language models: gpt-4-turbo, claude-3-opus, and gemini-1.5-pro-latest, for detecting three ocular surface diseases (OSDs) and grading keratitis and pterygium. A total of 375 smartphone-captured ocular surface images collected from 290 eyes were utilized to validate MOSAIC. The performance of MOSAIC was evaluated in both zero-shot and few-shot settings, with tasks including image quality control, OSD detection, analysis of the severity of keratitis, and pterygium grading. The interpretability of the system was also evaluated.

Results: MOSAIC achieved 95.00% accuracy in image quality control, 86.96% in OSD detection, 88.33% in distinguishing mild from severe keratitis, and 66.67% in determining pterygium grades with five-shot settings. The performance significantly improved with the increasing learning shots (p < 0.01). The system attained high ROUGE-L F1 scores of 0.70-0.78, depicting its interpretable image comprehension capability.

Conclusion: MOSAIC exhibited exceptional few-shot learning capabilities, achieving high accuracy in OSD management with minimal training examples. This system has significant potential for smartphone integration to enhance the accessibility and effectiveness of OSD detection and grading in resource-limited settings.

Keywords: conjunctivitis; keratitis; large language model; multimodal model; ocular surface disease; pterygium.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Design and architecture of MOSAIC. MOSAIC was constructed with several components to keep its extendibility. The agent allocator acts as a router to create specific agents for sub-tasks in IAP. IAP is the automatic sequential workflow for analyzing keratitis, conjunctivitis, and pterygium with smartphone images. IAP Image Analysis Pipeline, IQC Image Quality Controller, DSD Disease Detector, SVA Severity Analyzer.
FIGURE 2
FIGURE 2
Task examples of the IQC. Based on our previous study on ocular surface image quality, we employed the following definitions for image quality categories. An image is classified as ineligible if it meets any of the following criteria: 1) Defocused images refer to blurry images in which the focus is not on the cornea. 2) Poor-field images refer to images in which one-fifth of the cornea was covered by eyelids. 3) Poor-location images refer to images in which one-fifth of the cornea was blurred because the cornea was not straight ahead. 4) An image quality is deemed eligible if it does not meet any of the aforementioned criteria. IQC Image Quality Controller.
FIGURE 3
FIGURE 3
Task examples of the DSD. The diagnostic definitions are as follows. 1) Keratitis: Keratitis is the inflammation of the cornea. 2) Pterygium: Pterygium is a roughly triangular tissue growth extending from the conjunctiva onto the cornea. 3) Conjunctivitis: Conjunctivitis refers to inflammation of the outermost layer of the white part of the eye or the inner surface of the eyelid. 4) Normal: No signs of the aforementioned conditions. DSD Disease Detector.
FIGURE 4
FIGURE 4
Task examples of the SVA. The definition of keratitis in the mild stage refers to the lesion located outside the central cornea with a diameter of less than 2 mm. The criteria of pterygium grading mainly focus on the surgical timing indicated by the location of the pterygium head, corneal limbus, and pupillary, which categorizes cases as grade one if the length of the limbal invasion is between 0 and 2 mm; as grade two if the invasion is between 2 and 4 mm and as grade three if the invasion was exceeding 4 mm. SVA Severity Analyzer.
FIGURE 5
FIGURE 5
Comparing the performance of MLLMs and few-shot levels for agents in the MOSAIC. (a–d). Confusion matrices describing the prediction results of three MLLMs and three few-shot levels for agents IQC, DSD, SVA (keratitis stage), and SVA (pterygium grade) in order. (e–h). The accuracies of three MLLMs and three few-shot levels for agents in the same order. IQC Image Quality Controller, DSD Diseases Detector, SVA Severity Analyzer. MLLMs multimodal large language model. EL eligible, DF defocused, PF poor-field, PL poor-location. KT keratitis, CJ conjunctivitis, PT pterygium, NM normal, MK keratitis (non-mild stage), NK keratitis (mild stage), G1 (pterygium grade one), G2 (pterygium grade two), G3 (pterygium grade three).
FIGURE 6
FIGURE 6
The distribution of ROUGE-L F1 scores for MOSAIC’s interpretation of images. (a–d). The ROUGE-L F1 scores of each test image for IQC, DSD, SVA (keratitis stage), and SVA (pterygium grade). Higher scores are aligned with correct classification results, and lower scores are aligned with wrong classification results, suggesting that the decisions made by MOSAIC agree with the reasonings. IQC Image Quality Controller, DSD Diseases Detector, SVA Severity Analyzer. (K) Keratitis stages, (P) pterygium grades. ROUGE-L Recall-Oriented Understudy for Gisting Evaluation.

Similar articles

Cited by

References

    1. Azari A. A., Barney N. P. (2013). Conjunctivitis: a systematic review of diagnosis and treatment. JAMA 310 (16), 1721–1729. 10.1001/jama.2013.280318 - DOI - PMC - PubMed
    1. Brown T. B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., et al. (2020). “Language models are few-shot learners,”, arXiv: arXiv:2005.14165. 10.48550/arXiv.2005.14165 - DOI
    1. Burton M. J. (2009). Prevention, treatment and rehabilitation. Community Eye Health 22 (71), 33–35. Available online at: https://pmc.ncbi.nlm.nih.gov/articles/PMC2823104/ . - PMC - PubMed
    1. Burton M. J., Ramke J., Marques A. P., Bourne R. R. A., Congdon N., Jones I., et al. (2021). The lancet global health commission on global eye health: vision beyond 2020. Lancet Glob. Health 9 (4), e489–e551. 10.1016/S2214-109X(20)30488-5 - DOI - PMC - PubMed
    1. Caffery L. J., Taylor M., Gole G., Smith A. C. (2019). Models of care in tele-ophthalmology: a scoping review. J. Telemed. Telecare 25 (2), 106–122. 10.1177/1357633X17742182 - DOI - PubMed

LinkOut - more resources