Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 17;13(1):11493.
doi: 10.1038/s41598-023-38610-y.

Towards population-independent, multi-disease detection in fundus photographs

Affiliations

Towards population-independent, multi-disease detection in fundus photographs

Sarah Matta et al. Sci Rep. .

Abstract

Independent validation studies of automatic diabetic retinopathy screening systems have recently shown a drop of screening performance on external data. Beyond diabetic retinopathy, this study investigates the generalizability of deep learning (DL) algorithms for screening various ocular anomalies in fundus photographs, across heterogeneous populations and imaging protocols. The following datasets are considered: OPHDIAT (France, diabetic population), OphtaMaine (France, general population), RIADD (India, general population) and ODIR (China, general population). Two multi-disease DL algorithms were developed: a Single-Dataset (SD) network, trained on the largest dataset (OPHDIAT), and a Multiple-Dataset (MD) network, trained on multiple datasets simultaneously. To assess their generalizability, both algorithms were evaluated whenever training and test data originate from overlapping datasets or from disjoint datasets. The SD network achieved a mean per-disease area under the receiver operating characteristic curve (mAUC) of 0.9571 on OPHDIAT. However, it generalized poorly to the other three datasets (mAUC < 0.9). When all four datasets were involved in training, the MD network significantly outperformed the SD network (p = 0.0058), indicating improved generality. However, in leave-one-dataset-out experiments, performance of the MD network was significantly lower on populations unseen during training than on populations involved in training (p < 0.0001), indicating imperfect generalizability.

PubMed Disclaimer

Conflict of interest statement

The authors Sarah Matta, Mathieu Lamard, Pierre-Henri Conze and Jean-Bernard Rottier declare no Competing Financial or Non-Financial Interests. The authors Clément Lecat, Fabien Basset and Romuald Carette declare no Competing Non-Financial Interests but the following Competing Financial Interests: Employee – Evolucare Technologies. The author Alexandre Le Guilcher declares no Competing Non-Financial Interests but the following Competing Financial Interests: Research & Innovation director – Evolucare Technologies; CEO – OphtAI. The author Pascale Massin declares no Competing Non-Financial Interests but the following Competing Financial Interests: Consultant – Allerga, Bayer, Novartis, Thea, Horus. The author Béatrice Cochener declares no Competing Non-Financial Interests but the following Competing Financial Interests: Consultant and clinical investigator – Thea, Alcon, Zeiss, B&L, Hoya, Horus, Santen, SIFI, Cutting Edge, J&J. The author Gwenolé Quellec declares no Competing Non-Financial Interests but the following Competing Financial Interests: Consultant – Evolucare Technologies, Adcis.

Figures

Figure 1
Figure 1
An overview of our proposed study. (a) A single-dataset network trained on a single homogeneous dataset. (b) A multi-dataset network trained on multiple heterogeneous datasets. (c) Assessing deep learning algorithms generality for data coming from an in-domain or out-of-domain distribution.
Figure 2
Figure 2
ROC curves for the SD network and the MD network trained on all the datasets (K=4). The left column shows the ROC curves for the SD network and the right column shows the ROC curves for the MD network trained on all the datasets (K=4) on the OPHDIAT test subset (a, b), the OphtaMaine test subset (c, d), the RIADD test subset (e, f) and the ODIR test subset (g, h).
Figure 3
Figure 3
ROC curves for the MD network when a dataset is included for training and when it is left out. On each test subset, the left column shows the ROC curves for the MD network when the associated training/validation subsets are left out and the right column shows the ROC curves when the associated training/validation subsets are included for training the MD network. The ROC curves are shown on the OPHDIAT test subset (a, b), the OphtaMaine test subset (c, d), the RIADD test subset (e, f) and the ODIR test subset (g, h).

References

    1. Gulshan V, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402–2410. doi: 10.1001/jama.2016.17216. - DOI - PubMed
    1. Ruamviboonsuk P, et al. Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program. Npj Digit. Med. 2019;2:1–9. - PMC - PubMed
    1. Massin P, et al. OPHDIAT©: A telemedical network screening system for diabetic retinopathy in the Île-de-France. Diabetes Metab. 2008;34:227–234. doi: 10.1016/j.diabet.2007.12.006. - DOI - PubMed
    1. Cuadros J, Bresnick G. EyePACS: an adaptable telemedicine system for diabetic retinopathy screening. J. Diabetes Sci. Technol. 2009;3:509–516. doi: 10.1177/193229680900300315. - DOI - PMC - PubMed
    1. Ting DSW, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318:2211–2223. doi: 10.1001/jama.2017.18152. - DOI - PMC - PubMed

Publication types