Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 3;18(8):e0289211.
doi: 10.1371/journal.pone.0289211. eCollection 2023.

Learning from small data: Classifying sex from retinal images via deep learning

Affiliations

Learning from small data: Classifying sex from retinal images via deep learning

Aaron Berk et al. PLoS One. .

Abstract

Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images-a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Training-time metrics for runs D1 through D5.
(a) accuracy score (proportion correct); (b) binary cross-entropy loss; (c) AUC score. Vertical dashed lines correspond with the best epoch as selected by early stopping.
Fig 2
Fig 2. Training-time metrics for runs N1 through N6.
(a) accuracy score (proportion correct); (b) binary cross-entropy loss; (c) AUC score. Vertical dashed lines correspond with the best epoch as selected by early stopping.
Fig 3
Fig 3. Training-time metrics for runs C1 through C6.
(a) accuracy score (proportion correct); (b) binary cross-entropy loss; (c) AUC score. Vertical dashed lines correspond with the best epoch as selected by early stopping.
Fig 4
Fig 4. A graphical representation of all models developed in this work.
On the x-axis is a model’s AUC score on the validation partition of the relevant database; on the y-axis, its AUC score on the test partition. Points labelled D1–5 correspond with models trained and evaluated on DOVS-i; E1–10, DOVS-ii; N1–6, ODIR-N; C1–6, ODIR-C. The ensemble model is denoted E*. Marker size corresponds with the number of images in the database used for model development (cf. Table 1).
Fig 5
Fig 5. A graphical representation of the domain adaptation results for the models developed in this work.
On the x-axis is a model’s AUC score on the test partition from the database on which it was trained; on the y-axis, its domain-adapted AUC score. Points labelled D1–5 correspond with models trained and evaluated on DOVS-i; E1–10, DOVS-ii; N1–6, ODIR-N; C1–6, ODIR-C. The ensemble model is denoted E*. The marker radius of the plotted points corresponds with the number of images in the database used for model development (cf. Table 1). Each point is coloured according to the database on which the associated model was trained, and the shape of each point corresponds with the database on which that model was evaluated for its domain-adapted AUC score.
Fig 6
Fig 6. Mean Guided Grad-CAM activations for Female (top row) and Male (bottom row) fundus images.
(a) F227_R, (b) F22_L, (c) M218_L, (d) M273_R.

References

    1. Abràmoff MD, Garvin MK, Sonka M. Retinal imaging and image analysis. IEEE reviews in biomedical engineering. 2010. 3:169–208. doi: 10.1109/RBME.2010.2084567 - DOI - PMC - PubMed
    1. Wagner SK, Fu DJ, Faes L, Liu X, Huemer J, Khalid H, et al. Insights into systemic disease through retinal imaging-based oculomics. Translational vision science & technology. 2020. 9(2):6–6. doi: 10.1167/tvst.9.2.6 - DOI - PMC - PubMed
    1. Choi JY, Yoo TK, Seo JG, Kwak J, Um TT, Rim TH. Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database. PLOS ONE. 2017. 12(11):e0187336. doi: 10.1371/journal.pone.0187336 - DOI - PMC - PubMed
    1. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama. 2016. 316(22):2402–2410. doi: 10.1001/jama.2016.17216 - DOI - PubMed
    1. Gulshan V, Rajan RP, Widner K, Wu D, Wubbels P, Rhodes T, et al. Performance of a deep-learning algorithm vs manual grading for detecting diabetic retinopathy in india. JAMA ophthalmology. 2019. 137(9):987–993. doi: 10.1001/jamaophthalmol.2019.2004 - DOI - PMC - PubMed

Publication types