nnMobileNet: Rethinking CNN for Retinopathy Research

Wenhui Zhu¹, Peijie Qiu², Xiwen Chen³, Xin Li¹, Natasha Lepore⁴, Oana M Dumitrascu⁵, Yalin Wang¹

Affiliations

¹ School of Computing and Augmented Intelligence, Arizona State University.
² McKeley School of Engineering, Washington University in St. Louis.
³ School of Computing, Clemson University.
⁴ CIBORG Lab, Department of Radiology Children's Hospital Los Angeles.
⁵ Department of Neurology, Mayo Clinic.

PMID: 40356800
PMCID: PMC12068684
DOI: 10.1109/CVPRW63382.2024.00234

nnMobileNet: Rethinking CNN for Retinopathy Research

Wenhui Zhu et al. Conf Comput Vis Pattern Recognit Workshops. 2024 Jun.

. 2024 Jun:2024:2285-2294.

doi: 10.1109/CVPRW63382.2024.00234. Epub 2024 Sep 27.

Authors

Wenhui Zhu¹, Peijie Qiu², Xiwen Chen³, Xin Li¹, Natasha Lepore⁴, Oana M Dumitrascu⁵, Yalin Wang¹

Affiliations

¹ School of Computing and Augmented Intelligence, Arizona State University.
² McKeley School of Engineering, Washington University in St. Louis.
³ School of Computing, Clemson University.
⁴ CIBORG Lab, Department of Radiology Children's Hospital Los Angeles.
⁵ Department of Neurology, Mayo Clinic.

PMID: 40356800
PMCID: PMC12068684
DOI: 10.1109/CVPRW63382.2024.00234

Abstract

Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET.

PubMed Disclaimer

Figures

**Figure 1.**
Model size vs average performance (F1, Accuracy and AUC) on retinal multi-disease abnormal detection using RFMid dataset. Our method demonstrates superiority over other CNN/ViT based methods in terms of performance and efficiency.

**Figure 2.**
The roadmap of modifying a MobileNetV2 to the proposed no-new MobileNet (nnMobileNet) on the Messidor-2 dataset;

**Figure 3.**
The detailed architecture of the no-new MobileNet (Including the Channel configuration) and the inverted linear residual bottleneck used in the no-new MobileNet.

**Figure 4.**
Examples of data augmentation (Method III) and details of three sets of data augmentation we used.

**Figure 5.**
Empirical studies on Messidor-2 dataset where subpanel pictures (a), (b), (c), and (d) represent different experimental groups, each of which is independent of the others. D and SD-[x] in subpanel (b) denote Dropout and SpatialDropout in position [x] as shown in Fig.3(c), respectively.

**Figure 6.**
The comparative visualization on the Messidor-1 dataset was performed utilizing CAM [29]. We chose representative CNN/ViT-based methods with publicly available code, including MIL-VT [38], Swin-L [16], CrossFormer-L [35], CANet [15] and ReXNet [9].

See this image and copyright information in PMC

References

1. Aptos database.
1. https://codalab.lisn.upsaclay.fr/competitions/12441.
1. Brown GARYC, Brown MELISSAM, Hiller TYRIE, Fischer DAVID, Benson WILLIAME, and Magargal LARRYE. Cotton-wool spots. Retina, 5(4):206–214, 1985. - PubMed
1. Che Haoxuan, Jin Haibo, and Chen Hao. Learning robust representation for joint grading of ophthalmic diseases via adaptive curriculum and feature disentanglement. In MICCAI, pages 523–533, 2022.
1. Chen Q and et al. A multi-task deep learning model for the classification of Age-related Macular Degeneration. AMIA Jt Summits Transl Sci Proc, 2019. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

nnMobileNet: Rethinking CNN for Retinopathy Research

Affiliations

nnMobileNet: Rethinking CNN for Retinopathy Research

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous