. 2023 Oct 30;10(11):1266.

doi: 10.3390/bioengineering10111266.

Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Elizabeth E Hwang^{1

2}, Dake Chen¹, Ying Han¹, Lin Jia³, Jing Shan¹

Affiliations

¹ Department of Ophthalmology, University of California, San Francisco, San Francisco, CA 94143, USA.
² Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA 94143, USA.
³ Digillect LLC, San Francisco, CA 94158, USA.

PMID: 38002390
PMCID: PMC10669064
DOI: 10.3390/bioengineering10111266

Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Elizabeth E Hwang et al. Bioengineering (Basel). 2023.

. 2023 Oct 30;10(11):1266.

doi: 10.3390/bioengineering10111266.

Authors

Elizabeth E Hwang^{1

2}, Dake Chen¹, Ying Han¹, Lin Jia³, Jing Shan¹

Affiliations

¹ Department of Ophthalmology, University of California, San Francisco, San Francisco, CA 94143, USA.
² Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA 94143, USA.
³ Digillect LLC, San Francisco, CA 94158, USA.

PMID: 38002390
PMCID: PMC10669064
DOI: 10.3390/bioengineering10111266

Abstract

Glaucomatous optic neuropathy (GON) can be diagnosed and monitored using fundus photography, a widely available and low-cost approach already adopted for automated screening of ophthalmic diseases such as diabetic retinopathy. Despite this, the lack of validated early screening approaches remains a major obstacle in the prevention of glaucoma-related blindness. Deep learning models have gained significant interest as potential solutions, as these models offer objective and high-throughput methods for processing image-based medical data. While convolutional neural networks (CNN) have been widely utilized for these purposes, more recent advances in the application of Transformer architectures have led to new models, including Vision Transformer (ViT,) that have shown promise in many domains of image analysis. However, previous comparisons of these two architectures have not sufficiently compared models side-by-side with more than a single dataset, making it unclear which model is more generalizable or performs better in different clinical contexts. Our purpose is to investigate comparable ViT and CNN models tasked with GON detection from fundus photos and highlight their respective strengths and weaknesses. We train CNN and ViT models on six unrelated, publicly available databases and compare their performance using well-established statistics including AUC, sensitivity, and specificity. Our results indicate that ViT models often show superior performance when compared with a similarly trained CNN model, particularly when non-glaucomatous images are over-represented in a given dataset. We discuss the clinical implications of these findings and suggest that ViT can further the development of accurate and scalable GON detection for this leading cause of irreversible blindness worldwide.

Keywords: deep learning; fundus photography; glaucoma; vision transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Representative fundus photographs from the datasets used in this study. GON: glaucomatous optic neuropathy.

**Figure 2**
Workflow of ViT vs. CNN model training (with hyperparameters) and validation.

**Figure 3**
ROC curves and confusion matrices for ViT and CNN models trained on individual datasets (A–F). For the confusion matrices, a classification of 0 refers to control/non-glaucomatous, whereas a classification of 1 refers to glaucomatous. Ground truth labels were used as provided by the original datasets (ref. Table 1).

**Figure 4**
ViT outperforms CNN models in datasets with greater class imbalance but not class size. (∆ = ViT − CNN, where ViT outperforms CNN when ∆ > 0, and CNN outperforms ViT when ∆ < 0) Log-linear regression models (dotted lines) are included with coefficients of determination as indicated. (a) ∆AUC as a function of class ratio. (b) ∆AUC as a function of class size. (c) ∆Specificity as a function of class ratio. See Table 1 for class sizes and ratios.

See this image and copyright information in PMC

Cited by

Meeting Challenges in the Diagnosis and Treatment of Glaucoma.
Kooner KS, Choo DM, Mekala P. Kooner KS, et al. Bioengineering (Basel). 2024 Dec 25;12(1):6. doi: 10.3390/bioengineering12010006. Bioengineering (Basel). 2024. PMID: 39851280 Free PMC article.
Application of artificial intelligence in glaucoma care: An updated review.
Wu JH, Lin S, Moghimi S. Wu JH, et al. Taiwan J Ophthalmol. 2024 Sep 13;14(3):340-351. doi: 10.4103/tjo.TJO-D-24-00044. eCollection 2024 Jul-Sep. Taiwan J Ophthalmol. 2024. PMID: 39430354 Free PMC article. Review.
Generative Artificial Intelligence Enhancements for Reducing Image-based Training Data Requirements.
Chen D, Han Y, Duncan J, Jia L, Shan J. Chen D, et al. Ophthalmol Sci. 2024 Apr 14;4(5):100531. doi: 10.1016/j.xops.2024.100531. eCollection 2024 Sep-Oct. Ophthalmol Sci. 2024. PMID: 39071920 Free PMC article.
Explainable Deep Learning for Glaucomatous Visual Field Prediction: Artifact Correction Enhances Transformer Models.
Sriwatana K, Puttanawarut C, Suwan Y, Achakulvisut T. Sriwatana K, et al. Transl Vis Sci Technol. 2025 Jan 2;14(1):22. doi: 10.1167/tvst.14.1.22. Transl Vis Sci Technol. 2025. PMID: 39847375 Free PMC article.
CA-ViT: Contour-Guided and Augmented Vision Transformers to Enhance Glaucoma Classification Using Fundus Images.
Tohye TG, Qin Z, Al-Antari MA, Ukwuoma CC, Lonseko ZM, Gu YH. Tohye TG, et al. Bioengineering (Basel). 2024 Aug 31;11(9):887. doi: 10.3390/bioengineering11090887. Bioengineering (Basel). 2024. PMID: 39329629 Free PMC article.

References

1. Tham Y.-C., Li X., Wong T.Y., Quigley H.A., Aung T., Cheng C.-Y. Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta-analysis. Ophthalmology. 2014;121:2081–2090. doi: 10.1016/j.ophtha.2014.05.013. - DOI - PubMed
1. Vajaranant T.S., Wu S., Torres M., Varma R. The changing face of primary open-angle glaucoma in the United States: Demographic and geographic changes from 2011 to 2050. Arch. Ophthalmol. 2012;154:303–314.e3. doi: 10.1016/j.ajo.2012.02.024. - DOI - PMC - PubMed
1. Stein J.D., Khawaja A.P., Weizer J.S. Glaucoma in Adults—Screening, Diagnosis, and Management: A Review. JAMA. 2021;325:164–174. doi: 10.1001/jama.2020.21899. - DOI - PubMed
1. Chou R., Selph S., Blazina I., Bougatsos C., Jungbauer R., Fu R., Grusing S., Jonas D.E., Tehrani S. Screening for Glaucoma in Adults: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA. 2022;327:1998–2012. doi: 10.1001/jama.2022.6290. - DOI - PubMed
1. Thompson A.C., Jammal A.A., Medeiros F.A. A Review of Deep Learning for Screening, Diagnosis, and Detection of Glaucoma Progression. Transl. Vis. Sci. Technol. 2020;9:42. doi: 10.1167/tvst.9.2.42. - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Affiliations

Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources