. 2024 Jun 19:14:1320220.

doi: 10.3389/fonc.2024.1320220. eCollection 2024.

Improving skin cancer detection by Raman spectroscopy using convolutional neural networks and data augmentation

Jianhua Zhao^{1

2}, Harvey Lui^{1

2}, Sunil Kalia^{1

3

4}, Tim K Lee^{1

2}, Haishan Zeng^{1

2}

Affiliations

¹ Photomedicine Institute, Department of Dermatology and Skin Science, University of British Columbia and Vancouver Coastal Health Research Institute, Vancouver, BC, Canada.
² BC Cancer Research Institute, University of British Columbia, Vancouver, BC, Canada.
³ BC Children's Hospital Research Institute, Vancouver, BC, Canada.
⁴ Centre for Clinical Epidemiology and Evaluation, Vancouver Coastal Health Research Institute, Vancouver, BC, Canada.

PMID: 38962264
PMCID: PMC11219827
DOI: 10.3389/fonc.2024.1320220

Improving skin cancer detection by Raman spectroscopy using convolutional neural networks and data augmentation

Jianhua Zhao et al. Front Oncol. 2024.

. 2024 Jun 19:14:1320220.

doi: 10.3389/fonc.2024.1320220. eCollection 2024.

Authors

Jianhua Zhao^{1

2}, Harvey Lui^{1

2}, Sunil Kalia^{1

3

4}, Tim K Lee^{1

2}, Haishan Zeng^{1

2}

Affiliations

¹ Photomedicine Institute, Department of Dermatology and Skin Science, University of British Columbia and Vancouver Coastal Health Research Institute, Vancouver, BC, Canada.
² BC Cancer Research Institute, University of British Columbia, Vancouver, BC, Canada.
³ BC Children's Hospital Research Institute, Vancouver, BC, Canada.
⁴ Centre for Clinical Epidemiology and Evaluation, Vancouver Coastal Health Research Institute, Vancouver, BC, Canada.

PMID: 38962264
PMCID: PMC11219827
DOI: 10.3389/fonc.2024.1320220

Abstract

Background: Our previous studies have demonstrated that Raman spectroscopy could be used for skin cancer detection with good sensitivity and specificity. The objective of this study is to determine if skin cancer detection can be further improved by combining deep neural networks and Raman spectroscopy.

Patients and methods: Raman spectra of 731 skin lesions were included in this study, containing 340 cancerous and precancerous lesions (melanoma, basal cell carcinoma, squamous cell carcinoma and actinic keratosis) and 391 benign lesions (melanocytic nevus and seborrheic keratosis). One-dimensional convolutional neural networks (1D-CNN) were developed for Raman spectral classification. The stratified samples were divided randomly into training (70%), validation (10%) and test set (20%), and were repeated 56 times using parallel computing. Different data augmentation strategies were implemented for the training dataset, including added random noise, spectral shift, spectral combination and artificially synthesized Raman spectra using one-dimensional generative adversarial networks (1D-GAN). The area under the receiver operating characteristic curve (ROC AUC) was used as a measure of the diagnostic performance. Conventional machine learning approaches, including partial least squares for discriminant analysis (PLS-DA), principal component and linear discriminant analysis (PC-LDA), support vector machine (SVM), and logistic regression (LR) were evaluated for comparison with the same data splitting scheme as the 1D-CNN.

Results: The ROC AUC of the test dataset based on the original training spectra were 0.886±0.022 (1D-CNN), 0.870±0.028 (PLS-DA), 0.875±0.033 (PC-LDA), 0.864±0.027 (SVM), and 0.525±0.045 (LR), which were improved to 0.909±0.021 (1D-CNN), 0.899±0.022 (PLS-DA), 0.895±0.022 (PC-LDA), 0.901±0.020 (SVM), and 0.897±0.021 (LR) respectively after augmentation of the training dataset (p<0.0001, Wilcoxon test). Paired analyses of 1D-CNN with conventional machine learning approaches showed that 1D-CNN had a 1-3% improvement (p<0.001, Wilcoxon test).

Conclusions: Data augmentation not only improved the performance of both deep neural networks and conventional machine learning techniques by 2-4%, but also improved the performance of the models on spectra with higher noise or spectral shifting. Convolutional neural networks slightly outperformed conventional machine learning approaches for skin cancer detection by Raman spectroscopy.

Keywords: Raman spectroscopy; Skin cancer detection; artificial intelligence (AI); convolutional neural networks (CNN); data augmentation; machine learning; optical diagnosis.

PubMed Disclaimer

Conflict of interest statement

The authors and the BC Cancer Agency hold several patents for Raman spectroscopy that have been licensed to Vita Imaging Inc San Jose, California.

Figures

**Figure 1**
Averaged Raman spectra (and standard deviation) of malignant (n=340, including melanoma, basal cell carcinoma, squamous cell carcinoma and actinic keratosis) and benign skin lesions (n=391, including benign nevi and seborrheic keratosis). All the spectra were normalized to their respective areas under the curve between 500 and 1800 cm⁻¹ before being averaged. For clarity, standard deviation is shown top half for cancer and bottom half for benign lesions.

**Figure 2**
Examples of data augmentation for the training dataset of Raman spectra. **(A)** adding random noise of different noise levels, **(B)** spectral shifting, **(C)** spectral linear combination, and **(D)** data augmentation by one dimensional generative adversarial networks (1D-GAN) (averaged spectra are shown).

**Figure 3**
One-dimensional generative adversarial networks (1D-GAN) for data augmentation. 8*64, 4*64, 2*64, 1*64 and 1 are the number of kernels of each convolutional layer (and transposed convolutional layer for the generator).

**Figure 4**
Architecture and parameters of one-dimensional convolutional neural networks (1D-CNN) for Raman spectral classification. The number of kernels for each convolutional layer was 16, 32, 64 and 128 with kernel size = [3,1] for all the convolutional layers. The mini-batch size was 256 for the training dataset without augmentation and 1024 for the training dataset with augmentation. The pooling size was [2,1] with stride [2,1] for each average pooling layer. The size for the first fully connected layer was 256 and for the second fully connected layer was 2.

**Figure 5**
Example of the training process of the 1D-CNN for Raman spectral classification. Arrows showed the performance of the validation process no longer improving over at least 50 iterations, a possible stopping stage to prevent over-training. **(A)** Accuracy of the training and validation process. **(B)** Cross entropy loss of the training and validation process.

**Figure 6**
Random split of original dataset into training (70%), validation (10%) and testing (20%) datasets with and without data agumentation (dashed frame). Note that for analyses with augmentation, it was implemented only to the training dataset, not the validation and testing datasets. 1D-CNN, one-dimensional convolutional neural networks; PLS-DA, partial least squares for discriminant analysis; PC-LDA, pricipal component and linear discriminant analysis; SVM, support vector machine; LR, logistic regression.

**Figure 7**
ROC AUC of the test dataset of 56 random repetitions based on the original training dataset and different augmentation parameters, **(A)** adding random noise, **(B)** spectral shifting, **(C)** spectral linear combination, and **(D)** synthesized spectra by 1D-GAN. 1D-CNN, one-dimensional convolutional neural networks; PLS-DA, partial least squares for discriminant analysis; PC-LDA, principal component and linear discriminant analysis; SVM, support vector machine; LR, logistic regression.

**Figure 8**
Example of the ROC curves of the training, validation (n=73) and test (n=146) datasets. Top row is based on the original training spectra (n=512), and bottom row is based on the augmented spectra (n=14,608). 1D-CNN, one-dimensional convolutional neural networks; PLS-DA, partial least squares for discriminant analysis; PC-LDA, principal component and linear discriminant analysis; SVM, support vector machine; LR, logistic regression.

**Figure 9**
ROC AUC of the test dataset of 56 random repetitions based on the original training dataset without augmentation and the original training dataset with augmentation. Bar shows the mean and standard deviation.

**Figure 10**
ROC AUC of the extended test dataset by adding random noise or spectral shift to the original test dataset for models based on the original training dataset without augmentation (top row) and models based on the original training dataset with augmentation (bottom row). Data shown are the mean of 56 random repetitions using parallel computing. 1D-CNN, one-dimensional convolutional neural networks; PLS-DA, partial least squares for discriminant analysis; PC-LDA, principal component and linear discriminant analysis; SVM, support vector machine.

See this image and copyright information in PMC

Cited by

AI-assisted identification of nonmelanoma skin cancer structures based on combined line-field confocal optical coherence tomography and confocal Raman microspectroscopy.
Ayadh M, Waszczuk L, Ogien J, Dauce G, Augis L, Tfaili S, Tfayli A, Perrot JL, Dubois A. Ayadh M, et al. J Biomed Opt. 2025 Jul;30(7):076008. doi: 10.1117/1.JBO.30.7.076008. Epub 2025 Jul 28. J Biomed Opt. 2025. PMID: 40726594 Free PMC article.
Emerging Technologies for Timely Point-of-Care Diagnostics of Skin Cancer.
Thomas JL, Heagerty AHM, Goldberg Oppenheimer P. Thomas JL, et al. Glob Chall. 2025 Mar 18;9(5):2400274. doi: 10.1002/gch2.202400274. eCollection 2025 May. Glob Chall. 2025. PMID: 40352638 Free PMC article. Review.
A Static Sign Language Recognition Method Enhanced with Self-Attention Mechanisms.
Wang Y, Jiang H, Sun Y, Xu L. Wang Y, et al. Sensors (Basel). 2024 Oct 29;24(21):6921. doi: 10.3390/s24216921. Sensors (Basel). 2024. PMID: 39517818 Free PMC article.
Machine learning models to predict osteoporosis in patients with chronic kidney disease stage 3-5 and end-stage kidney disease.
Hsu CT, Huang CY, Chen CH, Deng YL, Lin SY, Wu MJ. Hsu CT, et al. Sci Rep. 2025 Apr 3;15(1):11391. doi: 10.1038/s41598-025-95928-5. Sci Rep. 2025. PMID: 40181057 Free PMC article.

References

1. Cancer facts & Figures 2023. Available at: www.cancer.org (Accessed September 13, 2023).
1. Olsen CM, Pandeya N, Green AC, Ragaini BS, Venn AJ, Whiteman DC. Keratinocyte cancer incidence in Australia: a review of population-based incidence trends and estimates of lifetime risk. Public Health Res Pract. (2022) 32:1–8. doi: 10.17061/phrp3212203 - DOI - PubMed
1. English DR, Del Mar C, Burton RC. Factors influencing the number needed to excise: excision rates of pigmented lesions by general practitioners. Med J Aust. (2004) 180:16–9. doi: 10.5694/j.1326-5377.2004.tb05766.x - DOI - PubMed
1. Lui H, Zhao J, McLean D, Zeng H. Real-time Raman spectroscopy for in vivo skin cancer diagnosis. Cancer Res. (2012) 72:2491–500. doi: 10.1158/0008-5472.CAN-11-4061 - DOI - PubMed
1. Zhao J, Zeng H, Kalia S, Lui H. Using Raman spectroscopy to detect and diagnose skin cancer in vivo. Dermatol Clin. (2017) 35:495–504. doi: 10.1016/j.det.2017.06.010 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving skin cancer detection by Raman spectroscopy using convolutional neural networks and data augmentation

Affiliations

Improving skin cancer detection by Raman spectroscopy using convolutional neural networks and data augmentation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources