This is a preprint.
Towards Accountable AI in Eye Disease Diagnosis: Workflow, External Validation, and Development
- PMID: 40969489
- PMCID: PMC12443229
Towards Accountable AI in Eye Disease Diagnosis: Workflow, External Validation, and Development
Update in
-
AI Workflow, External Validation, and Development in Eye Disease Diagnosis.JAMA Netw Open. 2025 Jul 1;8(7):e2517204. doi: 10.1001/jamanetworkopen.2025.17204. JAMA Netw Open. 2025. PMID: 40668583 Free PMC article.
Abstract
Importance: Timely disease diagnosis is challenging due to limited clinical availability and growing burdens. Although artificial intelligence (AI) shows expert-level diagnostic accuracy, a lack of downstream accountability-including workflow integration, external validation, and further development- continues to hinder its real-world adoption.
Objective: To address gaps in the downstream accountability of medical AI through a case study on age-related macular degeneration (AMD) diagnosis and severity classification.
Design setting and participants: We developed and evaluated an AI-assisted diagnostic and classification workflow for AMD. Four rounds of diagnostic assessments (accuracy and time) were conducted with 24 clinicians from 12 institutions. Each round was randomized and alternated between Manual and Manual + AI, with a washout period. In total, 2,880 AMD risk features were evaluated across 960 images from 240 Age-Related Eye Disease Study patient samples, both with and without AI assistance. For further development, we enhanced the original DeepSeeNet model into DeepSeeNet+ using ~40,000 additional images from the US population and tested it on three datasets, including an external set from Singapore.
Main outcomes and measures: We measured the F1-score for accuracy (Wilcoxon rank-sum test) and diagnostic time (linear mixed-effects model), comparing Manual vs. Manual + AI. For further development, the F1-score (Wilcoxon rank-sum) was again used.
Results: Among the 240 patients (mean age, 68.5 years; 53% female), AI assistance improved accuracy for 23 of 24 clinicians, increasing the average F1-score by 20% (37.71 to 45.52), with some improvements exceeding 50%. Manual diagnosis initially took an estimated 39.8 seconds per patient, whereas Manual + AI saved 10.3 seconds and remained 1.7-3.3 seconds faster in later rounds. However, combining manual and AI may not always yield the highest accuracy or efficiency, underscoring challenges in explainability and trust. DeepSeeNet+ performed better in three test sets, achieving 13% higher F1-score in the Singapore cohort.
Conclusions and relevance: In this diagnostic study, AI assistance improved both accuracy and time efficiency for AMD diagnosis. Further development was essential for enhancing AI generalizability across diverse populations. These findings highlight the need for downstream accountability during early-stage clinical evaluations of medical AI. All code and models are publicly available.
Figures
References
-
- Andoh J. E., Ezekwesili A. C., Nwanyanwu K. & Elam A. Disparities in eye care access and utilization: a narrative review. Annual Review of Vision Science 9, 15–37 (2023).
-
- Rezaei M. et al. Role of artificial intelligence in the diagnosis and treatment of diseases. Kindle 3, 1–160 (2023).
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources