Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 15;25(8):2479.
doi: 10.3390/s25082479.

Boosting Skin Cancer Classification: A Multi-Scale Attention and Ensemble Approach with Vision Transformers

Affiliations

Boosting Skin Cancer Classification: A Multi-Scale Attention and Ensemble Approach with Vision Transformers

Guang Yang et al. Sensors (Basel). .

Abstract

Skin cancer is a significant global health concern, with melanoma being the most dangerous form, responsible for the majority of skin cancer-related deaths. Early detection of skin cancer is critical, as it can drastically improve survival rates. While deep learning models have achieved impressive results in skin cancer classification, there remain challenges in accurately distinguishing between benign and malignant lesions. In this study, we introduce a novel multi-scale attention-based performance booster inspired by the Vision Transformer (ViT) architecture, which enhances the accuracy of both ViT and convolutional neural network (CNN) models. By leveraging attention maps to identify discriminative regions within skin lesion images, our method improves the models' focus on diagnostically relevant areas. Additionally, we employ ensemble learning techniques to combine the outputs of several deep learning models using majority voting. Our skin cancer classifier, consisting of ViT and EfficientNet models, achieved a classification accuracy of 95.05% on the ISIC2018 dataset, outperforming individual models. The results demonstrate the effectiveness of integrating attention-based multi-scale learning and ensemble methods in skin cancer classification.

Keywords: deep learning; image processing; neural networks; skin cancer classification; transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
A multi-scale attention and ensemble approach with Vision Transformers.
Figure 2
Figure 2
ISIC2018 image set samples.
Figure 3
Figure 3
Sensitivity/precision comparison for each type trained with/without oversampling [24].
Figure 4
Figure 4
Attention maps generated in ViT for skin tumors.
Figure 5
Figure 5
The attention map assists in identifying the discriminate region of the tumor.
Figure 6
Figure 6
ViT-based multi-scale performance booster attached to a deep learning model.
Figure 7
Figure 7
Attention map focusing on background objects for low-contract lesion.
Figure 8
Figure 8
Some misclassified melanoma images.

Similar articles

References

    1. Siegel R.L., Naishadham D., Jemal A. Cancer statistics, 2012. CA A Cancer J. Clin. 2012;62:10–29. doi: 10.3322/caac.20138. - DOI - PubMed
    1. Australian Bureau of Statistics . Causes of Death. ABS; Canberra, Australia: 2019. [(accessed on 1 November 2022)]. Available online: https://www.abs.gov.au/statistics/health/causes-death/causes-death-austr....
    1. Street W. Cancer Facts & Figures. American Cancer Society; Atlanta, GA, USA: 2019. [(accessed on 1 November 2022)]. Available online: http://cancerstatisticscenter.cancer.org.
    1. Bray F., Ferlay J., Soerjomataram I., Siegel R.L., Torre L.A., Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2018;68:394–424. doi: 10.3322/caac.21492. - DOI - PubMed
    1. Siegel R.L., Miller K.D., Jemal A. Cancer statistics, 2019. CA A Cancer J. Clin. 2019;69:7–34. doi: 10.3322/caac.21551. - DOI - PubMed

LinkOut - more resources