Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 14;65(13):6632-6643.
doi: 10.1021/acs.jcim.5c00513. Epub 2025 Jun 18.

Transfer-Learning Deep Raman Models Using Semiempirical Quantum Chemistry

Affiliations

Transfer-Learning Deep Raman Models Using Semiempirical Quantum Chemistry

Jawad Kamran et al. J Chem Inf Model. .

Abstract

Biophotonic technologies such as Raman spectroscopy are powerful tools for obtaining highly specific molecular information. Due to its minimal sample preparation requirements, Raman spectroscopy is widely used across diverse scientific disciplines, often in combination with chemometrics, machine learning (ML), and deep learning (DL). However, Raman spectroscopy lacks large databases of independent Raman spectra for model training, leading to overfitting, overestimation, and limited model generalizability. We address this problem by generating simulated vibrational spectra using semiempirical quantum chemistry methods, enabling the efficient pretraining of deep learning models on large synthetic data sets. These pretrained models are then fine-tuned on a smaller experimental Raman data set of bacterial spectra. Transfer learning significantly reduces the computational cost while maintaining performance comparable to models trained from scratch in this real biophotonic application. The results validate the utility of synthetic data for pretraining deep Raman models and offer a scalable framework for spectral analysis in resource-limited settings.

PubMed Disclaimer

Figures

1
1
Overview of the synthetic data set generation workflow for transfer learning. Small protein structures are used to calculate line spectra with GFN2-xTB2, which are broadened using Voigt profiles. Artifacts such as Gaussian noise and background signals are introduced to create simulated spectra. These spectra are then used to pretrain deep learning models, which are subsequently fine-tuned on bacterial data sets for transfer learning, resulting in the final model performance evaluation and comparison to other approaches.
2
2
Various stages in the generation of artificial spectra. (A) A calculated line spectrum, representing spectra peaks from theoretical calculations of the GFN2-xTB model. (B) A broadened line spectrum, where peaks are convolved with a Voigt profile to simulate realistic spectral shapes. (C) A spectrum with added artifacts, including Gaussian noise and background signals, to mimic experimental conditions.
3
3
Average background spectrum with standard deviation. The solid line represents the mean normalized intensity across 8000 background spectra, while the shaded region indicates the standard deviation. Backgrounds were taken from unrelated experimental data and added to the simulated spectra with varying strengths.
4
4
Distribution of functional group labels in the synthetic data set, showing the number of spectra per group. The two target classes used for pretraining, alcohol and aromatic secondary amine, are highlighted in orange, while the remaining functional groups provide additional context for the classification task. The “+30 other functional groups” category consolidates spectra from less frequent (<1000 spectra) classes.
5
5
Mean spectra with standard deviation for the target binary classes, alcohol (blue) and aromatic secondary amine (red). The spectra represent the mean normalized intensity but contain contributions from additional functional groups present in the data (see Figure ), which influence the observed variability.
6
6
Normalized mean spectra of bacterial classes. The spectra for six bacterial classes are displayed. While some distinct features are visible, significant spectral overlap is observed, indicating similarities among the classes across the wavenumber range.
7
7
Overview of the transfer-learning methodology. (SimNet) A four-layer deep convolutional neural network (CNN) is pretrained on synthetic spectra. (A) During fine-tuning, the pretrained model’s convolutional layers are frozen, and only the new classification layer is trained on experimental bacterial spectra. (B) For further optimization, a deeper and much smaller dense layer is unfrozen and retrained along with the classification layer. Batch normalization and flatten operations are retained throughout all phases.
8
8
Confusion matrices for the four evaluated models: Model trained from scratch, transfer learning A, transfer learning B, and PCA-LDA baseline. Each matrix shows the classification performance across the six bacterial classes in the test.
9
9
(A) Boxplot illustrating the mean balanced accuracy percentages ( ± σ) of different models: CNN from scratch: 83.66 ± 6.90%, PCA-LDA: 83.09 ± 8.14%, transfer learning (A): 80.47 ± 6.51%, and transfer learning (B): 85.30 ± 5.67. Although the final transfer learning results show the highest mean accuracy, statistical analysis did not reveal significant differences between models. (B) Bar plot comparing the log10-scaled training parameters, highlighting significant efficiency differences: TL­(A) with 2246 parameters and TL­(B) with 98k parameters achieve competitive accuracy despite CNN from scratch requiring 3.74 M parameters.

Similar articles

References

    1. Orlando A., Franceschini F., Muscas C.. et al. A Comprehensive Review on Raman Spectroscopy Applications. Chemosensors. 2021;9:262. doi: 10.3390/chemosensors9090262. - DOI
    1. Gala U., Chauhan H.. Principles and applications of Raman spectroscopy in pharmaceutical drug discovery and development. Expert Opin. Drug Discovery. 2015;10:187–206. doi: 10.1517/17460441.2015.981522. - DOI - PubMed
    1. Neugebauer U., Bocklitz T., Clement J. H., Krafft C., Popp J.. Towards detection and identification of circulating tumour cells using Raman spectroscopy. Analyst. 2010;135:3178–3182. doi: 10.1039/c0an00608d. - DOI - PubMed
    1. Baena J. R., Lendl B.. Raman spectroscopy in chemical bioanalysis. Curr. Opin Chem. Biol. 2004;8:534–539. doi: 10.1016/j.cbpa.2004.08.014. - DOI - PubMed
    1. Bohorfoush A. G.. Tissue spectroscopy for gastrointestinal diseases. Endoscopy. 1996;28:372–380. doi: 10.1055/s-2007-1005484. - DOI - PubMed

LinkOut - more resources