Transformer attention fusion for fine grained medical image classification
- PMID: 40596233
- PMCID: PMC12216456
- DOI: 10.1038/s41598-025-07561-x
Transformer attention fusion for fine grained medical image classification
Abstract
Fine-grained visual classification is fundamental for medical image applications because it detects minor lesions. Diabetic retinopathy (DR) is a preventable cause of blindness, which requires exact and timely diagnosis to prevent vision damage. The challenges automated DR classification systems face include irregular lesions, uneven distributions between image classes, and inconsistent image quality that reduces diagnostic accuracy during early detection stages. Our solution to these problems includes MSCAS-Net (Multi-Scale Cross and Self-Attention Network), which uses the Swin Transformer as the backbone. It extracts features at three different resolutions (12 × 12, 24 × 24, 48 × 48), allowing it to detect subtle local features and global elements. This model uses self-attention mechanics to improve spatial connections between single scales and cross-attention to automatically match feature patterns across multiple scales, thereby developing a comprehensive information structure. The model becomes better at detecting significant lesions because of its dual mechanism, which focuses on both attention points. MSCAS-Net displays the best performance on APTOS and DDR and IDRID benchmarks by reaching accuracy levels of 93.8%, 89.80% and 86.70%, respectively. Through its algorithm, the model solves problems with imbalanced datasets and inconsistent image quality without needing data augmentation because it learns stable features. MSCAS-Net demonstrates a breakthrough in automated DR diagnostics since it combines high diagnostic precision with interpretable abilities to become an efficient AI-powered clinical decision support system. The presented research demonstrates how fine-grained visual classification methods benefit detecting and treating DR during its early stages.
Keywords: Attention mechanism; Deep learning; Diabetic retinopathy classification; Fine-grained visual classification; Medical images; Multi-scale feature extraction.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Figures










Similar articles
-
Frequency-spatial feature fusion via a hierarchical framework for diabetic retinopathy classification in low-quality fundus images.Biomed Phys Eng Express. 2025 Aug 5;11(5). doi: 10.1088/2057-1976/adf3b5. Biomed Phys Eng Express. 2025. PMID: 40706624
-
LGF-Net: A multi-scale feature fusion network for thyroid nodule ultrasound image classification.J Appl Clin Med Phys. 2025 Aug;26(8):e70149. doi: 10.1002/acm2.70149. J Appl Clin Med Phys. 2025. PMID: 40714931 Free PMC article.
-
Advanced glaucoma disease segmentation and classification with grey wolf optimized U -Net++ and capsule networks.Int Ophthalmol. 2025 Jun 27;45(1):266. doi: 10.1007/s10792-025-03602-6. Int Ophthalmol. 2025. PMID: 40576831
-
Optical coherence tomography (OCT) for detection of macular oedema in patients with diabetic retinopathy.Cochrane Database Syst Rev. 2011 Jul 6;(7):CD008081. doi: 10.1002/14651858.CD008081.pub2. Cochrane Database Syst Rev. 2011. Update in: Cochrane Database Syst Rev. 2015 Jan 07;1:CD008081. doi: 10.1002/14651858.CD008081.pub3. PMID: 21735421 Updated.
-
MRI software and cognitive fusion biopsies in people with suspected prostate cancer: a systematic review, network meta-analysis and cost-effectiveness analysis.Health Technol Assess. 2024 Oct;28(61):1-310. doi: 10.3310/PLFG4210. Health Technol Assess. 2024. PMID: 39367754 Free PMC article.
References
-
- Salud, O. M. d.l. Organización Mundial de la Salud. https://www.who.int/es/news-room/fact-sheets/detail/diabetes.
-
- Hegde, A. & Sumana, K. R. Comparative study of diabetic retinopathy detection using machine learning techniques. Int. J. Res. Appl. Sci. Eng. Technol. (2022).
-
- Wan, S., Liang, Y. & Zhang, Y. Deep convolutional neural networks for diabetic retinopathy detection by image classification. Computers Electr. Eng.72, 274–282 (2018).
-
- Harithalakshmi, K., Rajan, R. & Nadheera, K. EfficientNet-based diabetic retinopathy classification using data augmentation. In 2023 9th International Conference on Smart Computing and Communications (ICSCC). (IEEE, 2023).
MeSH terms
LinkOut - more resources
Full Text Sources
Medical