Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;8(7):e2517204.
doi: 10.1001/jamanetworkopen.2025.17204.

AI Workflow, External Validation, and Development in Eye Disease Diagnosis

Affiliations

AI Workflow, External Validation, and Development in Eye Disease Diagnosis

Qingyu Chen et al. JAMA Netw Open. .

Abstract

Importance: Timely disease diagnosis is challenging due to limited clinical availability and growing burdens. Although artificial intelligence (AI) has shown expert-level diagnostic accuracy, a lack of downstream accountability, including workflow integration, external validation, and further development, continues to hinder its clinical adoption.

Objective: To address gaps in the downstream accountability of medical AI through a case study on age-related macular degeneration (AMD) diagnosis and severity classification.

Design, setting, and participants: This diagnostic study developed and evaluated an AI-assisted diagnostic and classification workflow for AMD. Four rounds of diagnostic assessments (accuracy and time) were conducted with 24 clinicians from 12 institutions. Each round was randomized and alternated between manual (clinician diagnosis) and manual plus AI (clinician assisted by AI diagnosis), with a 1-month washout period. In total, 2880 AMD risk features were evaluated across 960 images from 240 Age-Related Eye Disease Study patient samples, both with and without AI assistance. For further development, the original DeepSeeNet model was enhanced into the DeepSeeNet+ model using 39 196 additional images from the US population and tested on 3 datasets, including an external set from Singapore.

Exposure: Age-related macular degeneration risk features.

Main outcomes and measures: The F1 score for accuracy (Wilcoxon rank sum test) and diagnostic time (linear mixed-effects model) were measured, comparing manual vs manual plus AI. For further development, the F1 score (Wilcoxon rank sum test) was again used.

Results: Among 240 patients (mean [SD] age, 68.5 [5.0] years; 127 female [53%]), AI assistance significantly improved accuracy for 23 of 24 clinicians, increasing the mean F1 score from 37.71 (95% CI, 27.83-44.17) to 45.52 (95% CI, 39.01-51.61), with some improvements exceeding 50%. Manual diagnosis initially took an estimated 39.8 seconds (95% CI, 34.1-45.6 seconds) per patient, whereas manual plus AI saved 10.3 seconds (95% CI, -15.1 to -5.5 seconds) and remained faster by 6.9 seconds (95% CI, 0.2-13.7 seconds) to 8.6 seconds (95% CI, 1.8-15.3 seconds) in subsequent rounds. However, combining manual and AI did not always yield the highest accuracy or efficiency, underscoring challenges in explainability and trust. The DeepSeeNet+ model performed better in 3 test sets, achieving a significantly higher F1 score than the Singapore cohort (52.43 [95% CI, 44.38-61.00] vs 38.95 [95% CI, 30.50-47.45]).

Conclusions and relevance: In this diagnostic study, AI assistance was associated with improved accuracy and time efficiency for AMD diagnosis. Further development is essential for enhancing AI generalizability across diverse populations. These findings highlight the need for downstream accountability during early-stage clinical evaluations of medical AI.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Keenan reported a patent pending for Methods and Systems for Predicting Rates of Progression of Age-Related Macular Degeneration. Dr Mehta reported receiving grants from jCyte and Zeiss; personal fees from ANI, AbbVie, Apellis, Astellas Pharma, and Genentech; and nonfinancial support from Eyedaptic outside the submitted work. Prof Cheung reported receiving grants from the National Medical Research Council Singapore during the conduct of the study and grants from the National Medical Research Council outside the submitted work. Prof Cheng reported receiving grants from Medi-Whale outside the submitted work. Dr Hribar reported receiving grants from the National Institutes of Health during the conduct of the study and grants from the National Institutes of Health and personal fees from SEDLS and Real World Ophthalmology outside the submitted work. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Overview of the Artificial Intelligence (AI)–Assisted Diagnostic/Classification Workflow
AMD indicates age-related macular degeneration.
Figure 2.
Figure 2.. Comparison of Diagnostic Performance (F1 Score) for Manual vs Manual Plus Artificial Intelligence (AI) Assessment
Each dot represents an F1 score. Solid black cutoff lines represent the performance of the AI model alone. A-D, The bar inside the boxes indicates the median, and the lower and upper ends of the boxes are the first and third quartiles. The whiskers indicate values within 1.5-times the IQR from the upper or lower quartile (or the minimum and maximum if within 1.5-times the IQR of the quartiles). AMD indicates age-related macular degeneration.
Figure 3.
Figure 3.. Detailed Breakdown of F1 Score per Scale
Manual and manual plus artificial intelligence (AI) results represent paired comparisons from the same clinicians. The AI-only performance is from a single model and is shown for reference only; it is not directly comparable with the clinician results. AMD indicates age-related macular degeneration.
Figure 4.
Figure 4.. Diagnostic Time Efficiency With Artificial Intelligence (AI) Assistance

Update of

References

    1. Kumar Y, Koul A, Singla R, Ijaz MF. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. J Ambient Intell Humaniz Comput. 2023;14(7):8459-8486. doi: 10.1007/s12652-021-03612-z - DOI - PMC - PubMed
    1. Andoh JE, Ezekwesili AC, Nwanyanwu K, Elam A. Disparities in eye care access and utilization: a narrative review. Annu Rev Vis Sci. 2023;9:15-37. doi: 10.1146/annurev-vision-112122-020934 - DOI - PubMed
    1. Rezaei M, Rahmani E, Khouzani SJ, et al. Role of Artificial Intelligence in the Diagnosis and Treatment of Diseases. Kindle; 2023.
    1. Nwanyanwu KMJH, Nunez-Smith M, Gardner TW, Desai MM. Awareness of diabetic retinopathy: insight from the National Health and Nutrition Examination Survey. Am J Prev Med. 2021;61(6):900-909. doi: 10.1016/j.amepre.2021.05.018 - DOI - PMC - PubMed
    1. Peng Y, Dharssi S, Chen Q, et al. DeepSeeNet: a deep learning model for automated classification of patient-based age-related macular degeneration severity from color fundus photographs. Ophthalmology. 2019;126(4):565-575. doi: 10.1016/j.ophtha.2018.11.015 - DOI - PMC - PubMed

Publication types