Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 27;15(1):31536.
doi: 10.1038/s41598-025-14699-1.

Integrating data augmentation and BERT-based deep learning for predicting alpha-glucosidase inhibitors derived from Black Cohosh

Affiliations

Integrating data augmentation and BERT-based deep learning for predicting alpha-glucosidase inhibitors derived from Black Cohosh

Mohammadreza Torabi et al. Sci Rep. .

Abstract

Diabetes remains one of the critical health issues worldwide, and its prevalence is gaining motion due to prevailing factors such as obesity and a sedentary lifestyle. Traditional herbal medications and natural products, particularly enzyme inhibitors, such as alpha-glucosidase, serve as promising alternatives. This study attempted to identify potent alpha-glucosidase inhibitors by including data augmentation in deep-learning modeling. To achieve the aim, various data augmentation techniques were generated from diverse SMILES strings and augmented deep learning model performances through improved data variability. Fine-tuning of pre-trained models from the Hugging Face repository was performed, and among all, it was shown that the performance of PC10M-450k was the best recall. Further applications consider the model identified as PC10M-450 K. With this model, it was identified actaeaepoxide 3-O-xyloside from Black Cohosh was a potential inhibitor. Further molecular docking and MD simulations presented this compound to interact stably with the enzyme and possess a high inhibition probability when compared to acarbose. The results of insilico drug discovery displayed that actaeaepoxide 3-O-xyloside is pointed out to be a potential candidate for diabetes therapy. In conclusion, the role of augmentation techniques and pre-trained models was also emphasized in the presented investigation to accelerate drug discovery toward more effective therapeutic solutions.

Keywords: Alpha-glucosidase inhibitor; BERT-based model; Data augmentation; Deep learning; Drug discovery; Natural compounds.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethical approval and consent to participate: Not applicable.

Figures

Fig. 1
Fig. 1
Schematic of the study.
Fig. 2
Fig. 2
The flowchart of the main steps of model validation. In this figure * is referred to as the Natural compounds.
Fig. 3
Fig. 3
The result comparison of different pre-trained models.
Fig. 4
Fig. 4
Comparison of precision, recall, and F1-score between the PC10M-396_250 and PC10M-450k models. The PC10M-450k model demonstrated superior performance across Recall metric, leading to its selection as the best fine-tuned model for chemical property prediction tasks.
Fig. 5
Fig. 5
Comparison of classification performance metrics (ACC, MCC) before and after data augmentation. The results highlight significant improvements across all metrics, demonstrating the effectiveness of the augmented data technique in enhancing model performance.
Fig. 6
Fig. 6
Chemical structures of the founded compounds of black cohosh subjected to the docking into the alpha-glucosidase enzyme with autodock4. (a) acarbose, (b) Actaeaepoxide 3-O-xyloside, (c) cimiracemoside F, (d) isoferulic acid.
Fig. 7
Fig. 7
(a) RMSD of Acarbose (blue), Actaeaepoxide 3-O-xylosideacarbose (green), cimiracemoside F (light green), and isoferulic_acid complexes with α-glucosidase (red). (b) RMSD of Cα atoms of α-glucosidase in three complexes.
Fig. 8
Fig. 8
Type of interactions of compounds (a) acarbose, (b) actaeaepoxide 3-O-xyloside, (c) cimiracemoside F, and (d) isoferulic acid within α-glucosidase.
Fig. 9
Fig. 9
Backbone Root Mean Square Fluctuation (RMSF) profiles of α-glucosidase in complex with actaeaepoxide 3-O-xyloside, cimiracemoside F, and isoferulic acid over a 200 ns MD simulation.
Fig. 10
Fig. 10
Radius of Gyration (RoG) plot of α-glucosidase complexes during 200 ns molecular dynamics simulations.

Similar articles

References

    1. Gujral, U. P., Weber, M. B., Staimez, L. R. & Narayan, K. M. V. Diabetes among non-overweight individuals: an emerging public health challenge. Curr Diab Rep10.1007/s11892-018-1017-1 (2018). - PubMed
    1. Standl, E., Khunti, K., Hansen, T. B. & Schnell, O. The global epidemics of diabetes in the 21st century: Current situation and perspectives. Eur J Prev Cardiol.10.1177/2047487319881021 (2019). - PubMed
    1. Sati, P. et al. Gut microbiota targeted approach by natural products in diabetes management: an overview. Curr Nutr. Rep10.1007/s13668-024-00523-1 (2024). - PubMed
    1. Gurib-Fakim, A. Medicinal plants: traditions of yesterday and drugs of tomorrow. Mol Aspects Med Published Online. 10.1016/j.mam.2005.07.008 (2006). - PubMed
    1. Yusuf, B. O., Abdulsalam, R. A. & Sabiu, S. Diabetes Treatment and Prevention Using Herbal Medicine. (2023). 10.1007/978-3-031-21973-3_43-1

MeSH terms

LinkOut - more resources