Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 30;10(11):1266.
doi: 10.3390/bioengineering10111266.

Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Affiliations

Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Elizabeth E Hwang et al. Bioengineering (Basel). .

Abstract

Glaucomatous optic neuropathy (GON) can be diagnosed and monitored using fundus photography, a widely available and low-cost approach already adopted for automated screening of ophthalmic diseases such as diabetic retinopathy. Despite this, the lack of validated early screening approaches remains a major obstacle in the prevention of glaucoma-related blindness. Deep learning models have gained significant interest as potential solutions, as these models offer objective and high-throughput methods for processing image-based medical data. While convolutional neural networks (CNN) have been widely utilized for these purposes, more recent advances in the application of Transformer architectures have led to new models, including Vision Transformer (ViT,) that have shown promise in many domains of image analysis. However, previous comparisons of these two architectures have not sufficiently compared models side-by-side with more than a single dataset, making it unclear which model is more generalizable or performs better in different clinical contexts. Our purpose is to investigate comparable ViT and CNN models tasked with GON detection from fundus photos and highlight their respective strengths and weaknesses. We train CNN and ViT models on six unrelated, publicly available databases and compare their performance using well-established statistics including AUC, sensitivity, and specificity. Our results indicate that ViT models often show superior performance when compared with a similarly trained CNN model, particularly when non-glaucomatous images are over-represented in a given dataset. We discuss the clinical implications of these findings and suggest that ViT can further the development of accurate and scalable GON detection for this leading cause of irreversible blindness worldwide.

Keywords: deep learning; fundus photography; glaucoma; vision transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Representative fundus photographs from the datasets used in this study. GON: glaucomatous optic neuropathy.
Figure 2
Figure 2
Workflow of ViT vs. CNN model training (with hyperparameters) and validation.
Figure 3
Figure 3
ROC curves and confusion matrices for ViT and CNN models trained on individual datasets (AF). For the confusion matrices, a classification of 0 refers to control/non-glaucomatous, whereas a classification of 1 refers to glaucomatous. Ground truth labels were used as provided by the original datasets (ref. Table 1).
Figure 4
Figure 4
ViT outperforms CNN models in datasets with greater class imbalance but not class size. (∆ = ViT − CNN, where ViT outperforms CNN when ∆ > 0, and CNN outperforms ViT when ∆ < 0) Log-linear regression models (dotted lines) are included with coefficients of determination as indicated. (a) ∆AUC as a function of class ratio. (b) ∆AUC as a function of class size. (c) ∆Specificity as a function of class ratio. See Table 1 for class sizes and ratios.

Similar articles

Cited by

References

    1. Tham Y.-C., Li X., Wong T.Y., Quigley H.A., Aung T., Cheng C.-Y. Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta-analysis. Ophthalmology. 2014;121:2081–2090. doi: 10.1016/j.ophtha.2014.05.013. - DOI - PubMed
    1. Vajaranant T.S., Wu S., Torres M., Varma R. The changing face of primary open-angle glaucoma in the United States: Demographic and geographic changes from 2011 to 2050. Arch. Ophthalmol. 2012;154:303–314.e3. doi: 10.1016/j.ajo.2012.02.024. - DOI - PMC - PubMed
    1. Stein J.D., Khawaja A.P., Weizer J.S. Glaucoma in Adults—Screening, Diagnosis, and Management: A Review. JAMA. 2021;325:164–174. doi: 10.1001/jama.2020.21899. - DOI - PubMed
    1. Chou R., Selph S., Blazina I., Bougatsos C., Jungbauer R., Fu R., Grusing S., Jonas D.E., Tehrani S. Screening for Glaucoma in Adults: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA. 2022;327:1998–2012. doi: 10.1001/jama.2022.6290. - DOI - PubMed
    1. Thompson A.C., Jammal A.A., Medeiros F.A. A Review of Deep Learning for Screening, Diagnosis, and Detection of Glaucoma Progression. Transl. Vis. Sci. Technol. 2020;9:42. doi: 10.1167/tvst.9.2.42. - DOI - PMC - PubMed

LinkOut - more resources