Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems

Aaron Y Lee^{1

2

3}, Ryan T Yanagihara⁴, Cecilia S Lee^{4

2}, Marian Blazes⁴, Hoon C Jung^{4

2}, Yewlin E Chee⁴, Michael D Gencarella⁴, Harry Gee⁵, April Y Maa^{6

7}, Glenn C Cockerham^{8

9}, Mary Lynch^{6

10}, Edward J Boyko^{11

12}

Affiliations

¹ Department of Ophthalmology, University of Washington School of Medicine, Seattle, WA leeay@uw.edu.
² Department of Ophthalmology, Puget Sound Veteran Affairs, Seattle, WA.
³ eScience Institute, University of Washington, Seattle, WA.
⁴ Department of Ophthalmology, University of Washington School of Medicine, Seattle, WA.
⁵ Office of Information and Technology, Clinical Imaging, Seattle, WA.
⁶ Department of Ophthalmology, Emory University School of Medicine, Atlanta, GA.
⁷ Regional Telehealth Services, Veterans Affairs Southeast Network Veterans Integrated Service Networks (VISN) 7, Duluth, GA.
⁸ Veterans Health Administration, Specialty Care Services, Washington, DC.
⁹ Ophthalmology Service, Stanford University School of Medicine, Palo Alto, CA.
¹⁰ Ophthalmology Section, Atlanta Veterans Affairs Medical Center, Atlanta, GA.
¹¹ Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Medical Center, Seattle, WA.
¹² Department of Medicine, University of Washington, Seattle, WA.

PMID: 33402366
PMCID: PMC8132324
DOI: 10.2337/dc20-1877

Multicenter Study

Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems

Aaron Y Lee et al. Diabetes Care. 2021 May.

. 2021 May;44(5):1168-1175.

doi: 10.2337/dc20-1877. Epub 2021 Jan 5.

Authors

Affiliations

¹ Department of Ophthalmology, University of Washington School of Medicine, Seattle, WA leeay@uw.edu.
² Department of Ophthalmology, Puget Sound Veteran Affairs, Seattle, WA.
³ eScience Institute, University of Washington, Seattle, WA.
⁴ Department of Ophthalmology, University of Washington School of Medicine, Seattle, WA.
⁵ Office of Information and Technology, Clinical Imaging, Seattle, WA.
⁶ Department of Ophthalmology, Emory University School of Medicine, Atlanta, GA.
⁷ Regional Telehealth Services, Veterans Affairs Southeast Network Veterans Integrated Service Networks (VISN) 7, Duluth, GA.
⁸ Veterans Health Administration, Specialty Care Services, Washington, DC.
⁹ Ophthalmology Service, Stanford University School of Medicine, Palo Alto, CA.
¹⁰ Ophthalmology Section, Atlanta Veterans Affairs Medical Center, Atlanta, GA.
¹¹ Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Medical Center, Seattle, WA.
¹² Department of Medicine, University of Washington, Seattle, WA.

PMID: 33402366
PMCID: PMC8132324
DOI: 10.2337/dc20-1877

Abstract

Objective: With rising global prevalence of diabetic retinopathy (DR), automated DR screening is needed for primary care settings. Two automated artificial intelligence (AI)-based DR screening algorithms have U.S. Food and Drug Administration (FDA) approval. Several others are under consideration while in clinical use in other countries, but their real-world performance has not been evaluated systematically. We compared the performance of seven automated AI-based DR screening algorithms (including one FDA-approved algorithm) against human graders when analyzing real-world retinal imaging data.

Research design and methods: This was a multicenter, noninterventional device validation study evaluating a total of 311,604 retinal images from 23,724 veterans who presented for teleretinal DR screening at the Veterans Affairs (VA) Puget Sound Health Care System (HCS) or Atlanta VA HCS from 2006 to 2018. Five companies provided seven algorithms, including one with FDA approval, that independently analyzed all scans, regardless of image quality. The sensitivity/specificity of each algorithm when classifying images as referable DR or not were compared with original VA teleretinal grades and a regraded arbitrated data set. Value per encounter was estimated.

Results: Although high negative predictive values (82.72-93.69%) were observed, sensitivities varied widely (50.98-85.90%). Most algorithms performed no better than humans against the arbitrated data set, but two achieved higher sensitivities, and one yielded comparable sensitivity (80.47%, P = 0.441) and specificity (81.28%, P = 0.195). Notably, one had lower sensitivity (74.42%) for proliferative DR (P = 9.77 × 10^-4) than the VA teleretinal graders. Value per encounter varied at $15.14-$18.06 for ophthalmologists and $7.74-$9.24 for optometrists.

Conclusions: The DR screening algorithms showed significant performance differences. These results argue for rigorous testing of all such algorithms on real-world data before clinical implementation.

PubMed Disclaimer

Figures

**Figure 1**
The relative screening performance of AI algorithms. Using the full-image data set (A), the sensitivity, specificity, NPV, and PPV of each algorithm are shown using the original teleretinal grader as the reference standard. These analyses were repeated separately using color fundus photographs obtained from Atlanta (B) and Seattle (C).

**Figure 2**
Relative performance of human grader compared with AI algorithms. The relative performance of the VA teleretinal grader (Human) and algorithms A–G in screening for referable DR using the arbitrated data set at different thresholds of DR. A: Sensitivity and specificity of each algorithm compared with a human grader with 95% CI bars against a subset of double-masked arbitrated grades in screening for referable DR in images with mild NPDR or worse and ungradable image quality. B–D: Only gradable images were used. The VA teleretinal grader is compared with the AI sensitivities, with 95% CIs, at different thresholds of disease, including moderate NPDR or worse (B), severe NDPR or worse (C), and PDR (D). *P ≤ 0.05, **P ≤ 0.001, ***P ≤ 0.0001.

**Figure 3**
Value per encounter of AI algorithms meeting the sensitivity threshold. The value per encounter with 95% CI bars of algorithms E, F, and G. Only algorithms that achieved equivalent sensitivity to the VA teleretinal graders in screening for referable DR in images regraded as moderate NPDR or worse in the arbitrated data set were carried forward. The value per encounter of each algorithm if optometrists (Optom) or ophthalmologists (Ophth) were to implement this system into their clinical practice to make a normal profit on the basis of geographical location or the combined data set is shown. ATL, Atlanta; SEA, Seattle; TOT, total (Atlanta and Seattle).

See this image and copyright information in PMC

Comment in

Comment on Lee et al. Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems. Diabetes Care 2021;44:1168-1175.
Soliz P. Soliz P. Diabetes Care. 2021 May;44(5):e107. doi: 10.2337/dc21-0151. Diabetes Care. 2021. PMID: 33972324 Free PMC article. No abstract available.
Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems. Diabetes Care 2021;44:XXXX-XXXX.
Lee AY, Lee CS, Hunt MS, Yanagihara RT, Blazes M, Boyko EJ. Lee AY, et al. Diabetes Care. 2021 May;44(5):e108-e109. doi: 10.2337/dci21-0007. Diabetes Care. 2021. PMID: 33972325 Free PMC article. No abstract available.

References

1. Lee R, Wong TY, Sabanayagam C. Epidemiology of diabetic retinopathy, diabetic macular edema and related vision loss. Eye Vis (Lond) 2015;2:17. - PMC - PubMed
1. Liew G, Michaelides M, Bunce C. A comparison of the causes of blindness certifications in England and Wales in working age adults (16-64 years), 1999-2000 with 2009-2010. BMJ Open 2014;4:e004015 - PMC - PubMed
1. Jampol LM, Glassman AR, Sun J. Evaluation and care of patients with diabetic retinopathy. N Engl J Med 2020;382:1629–1637 - PubMed
1. Flaxel CJ, Adelman RA, Bailey ST, et al. Diabetic retinopathy preferred practice pattern®. Ophthalmology 2020;127:66–P145 - PubMed
1. American Diabetes Association . 11. Microvascular complications and foot care: Standards of Medical Care in Diabetes—2020. Diabetes Care 2020;43(Suppl. 1):S135–S151 - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems

Affiliations

Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems

Authors

Affiliations

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical