Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation

Yijing Zhuang^#¹, Dong Fang^#¹, Pengfeng Li^#¹, Bingyu Bai¹, Xiangqing Hei¹, Lujia Feng¹, Wangting Li¹, Shaochong Zhang¹

Affiliations

PMID: 40772224
PMCID: PMC12325206
DOI: 10.3389/fcell.2025.1642539

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation

Yijing Zhuang et al. Front Cell Dev Biol. 2025.

. 2025 Jul 23:13:1642539.

doi: 10.3389/fcell.2025.1642539. eCollection 2025.

Authors

Yijing Zhuang^#¹, Dong Fang^#¹, Pengfeng Li^#¹, Bingyu Bai¹, Xiangqing Hei¹, Lujia Feng¹, Wangting Li¹, Shaochong Zhang¹

Affiliation

¹ Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, Guangdong, China.

^# Contributed equally.

PMID: 40772224
PMCID: PMC12325206
DOI: 10.3389/fcell.2025.1642539

Abstract

Although large language models (LLMs) show significant potential in clinical practice, accurate diagnosis and treatment planning in ophthalmology require multimodal integration of imaging, clinical history, and guideline-based knowledge. Current LLMs predominantly focus on unimodal language tasks and face limitations in specialized ophthalmic diagnosis due to domain knowledge gaps, hallucination risks, and inadequate alignment with clinical workflows. This study introduces a structured reasoning agent (ReasonAgent) that integrates a multimodal visual analysis module, a knowledge retrieval module, and a diagnostic reasoning module to address the limitations of current AI systems in ophthalmic decision-making. Validated on 30 real-world ophthalmic cases (27 common and 3 rare diseases), ReasonAgent demonstrated diagnostic accuracy comparable to ophthalmology residents (β = -0.07, p = 0.65). However, in treatment planning, it significantly outperformed both GPT-4o (β = 0.49, p = 0.01) and residents (β = 1.71, p < 0.001), particularly excelling in rare disease scenarios (all p < 0.05). While GPT-4o showed vulnerabilities in rare cases (90.48% low diagnostic scores), ReasonAgent's hybrid design mitigated errors through structured reasoning. Statistical analysis identified significant case-level heterogeneity (diagnosis ICC = 0.28), highlighting the need for domain-specific AI solutions in complex clinical contexts. This framework establishes a novel paradigm for domain-specific AI in real-world clinical practice, demonstrating the potential of modularized architectures to advance decision fidelity through human-aligned reasoning pathways.

Keywords: GPT-4o; artificial intelligence; large language models; ocular diseases; reasoning agent.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Flowchart of the Reasoning Agent Design and the Evaluation of Different Methods’ Responses in Clinical Ophthalmology Scenarios. Ophthalmic imaging (e.g., OCT, B scan, SLO, FFA) and clinical history serve as input sources. The Vision Understanding Module (GPT-4o) analyzes ophthalmic images for abnormalities and descriptions. The Evidence Retrieval Module (RAG) extracts diagnostic knowledge from guidelines based on clinical history and ocular examination. These outputs, combined with clinical history text, are input into the Diagnostic Reasoning Module (DeepSeek-R1) within the reasoning agent for diagnostic analysis and treatment planning. Comparison groups included standalone GPT-4o and three residents. Responses were evaluated using Likert scales by 7 attending physicians.

**FIGURE 2**
Distribution of Likert Scores for Different Methods in Diagnostic Tasks and Treatment Planning Tasks. **(A)** Violin plot of Likert scores for diagnostic tasks; **(B)** Violin plot of Likert scores for treatment planning tasks. Embedded boxplots illustrate the interquartile range (25th to 75th percentile), the median (black horizontal line), and the whiskers represent the range of scores excluding outliers. Statistical analysis revealed no significant differences in diagnostic task scores between ReasonAgent, GPT-4o, and residents. In contrast, treatment planning tasks showed significantly higher scores for ReasonAgent than GPT-4o and residents. *p < 0.05, **p < 0.01, ***p < 0.001.

See this image and copyright information in PMC

References

1. Bommasani R., Hudson D., Adeli E., Altman R., Arora S., Arx S., et al. (2021). On the opportunities and risks of foundation models.
1. Cai X., Zhan L., Lin Y. (2024). Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition. Digit. Health 10, 20552076241298503. 10.1177/20552076241298503 - DOI - PMC - PubMed
1. Chen D., Huang R. S., Jomy J., Wong P., Yan M., Croke J., et al. (2024a). Performance of multimodal artificial intelligence chatbots evaluated on clinical oncology cases. JAMA Netw. Open 7 (10), e2437711. 10.1001/jamanetworkopen.2024.37711 - DOI - PMC - PubMed
1. Chen J., Xiao S., Zhang P., Luo K., Lian D., Liu Z. J. a.p.a. (2024b). Bge m3-embedding: multi-Lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.
1. Choi J., Oh A. R., Park J., Kang R. A., Yoo S. Y., Lee D. J., et al. (2024). Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0. Front. Med. 11, 1400153. 10.3389/fmed.2024.1400153 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation

Affiliation

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources