Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space

doi:10.1038/s41598-021-87134-w

. 2021 Apr 7;11(1):7628.

doi: 10.1038/s41598-021-87134-w.

Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space

Ewerton Cristhian Lima de Oliveira¹, Kauê Santana², Luiz Josino³, Anderson Henrique Lima E Lima⁴, Claudomiro de Souza de Sales Júnior⁵

Affiliations

¹ Institute of Technology, Federal University of Pará, Belém, Pará, 66075-110, Brazil.
² Institute of Biodiversity, Federal University of Western Pará, Vera Paz street, s/n Salé, Santarém, Pará, 68040-255, Brazil. kaue.costa@ufopa.edu.br.
³ Laboratório de Planejamento e Desenvolvimento de Fármacos, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Pará, 66075-110, Brazil.
⁴ Laboratório de Planejamento e Desenvolvimento de Fármacos, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Pará, 66075-110, Brazil. anderson@ufpa.br.
⁵ Institute of Technology, Federal University of Pará, Belém, Pará, 66075-110, Brazil. claudomiro.sales@gmail.com.

PMID: 33828175
PMCID: PMC8027643
DOI: 10.1038/s41598-021-87134-w

Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space

Ewerton Cristhian Lima de Oliveira et al. Sci Rep. 2021.

. 2021 Apr 7;11(1):7628.

doi: 10.1038/s41598-021-87134-w.

Authors

Ewerton Cristhian Lima de Oliveira¹, Kauê Santana², Luiz Josino³, Anderson Henrique Lima E Lima⁴, Claudomiro de Souza de Sales Júnior⁵

Affiliations

¹ Institute of Technology, Federal University of Pará, Belém, Pará, 66075-110, Brazil.
² Institute of Biodiversity, Federal University of Western Pará, Vera Paz street, s/n Salé, Santarém, Pará, 68040-255, Brazil. kaue.costa@ufopa.edu.br.
³ Laboratório de Planejamento e Desenvolvimento de Fármacos, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Pará, 66075-110, Brazil.
⁴ Laboratório de Planejamento e Desenvolvimento de Fármacos, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Pará, 66075-110, Brazil. anderson@ufpa.br.
⁵ Institute of Technology, Federal University of Pará, Belém, Pará, 66075-110, Brazil. claudomiro.sales@gmail.com.

PMID: 33828175
PMCID: PMC8027643
DOI: 10.1038/s41598-021-87134-w

Abstract

Cell-penetrating peptides (CPPs) are naturally able to cross the lipid bilayer membrane that protects cells. These peptides share common structural and physicochemical properties and show different pharmaceutical applications, among which drug delivery is the most important. Due to their ability to cross the membranes by pulling high-molecular-weight polar molecules, they are termed Trojan horses. In this study, we proposed a machine learning (ML)-based framework named BChemRF-CPPred (beyond chemical rules-based framework for CPP prediction) that uses an artificial neural network, a support vector machine, and a Gaussian process classifier to differentiate CPPs from non-CPPs, using structure- and sequence-based descriptors extracted from PDB and FASTA formats. The performance of our algorithm was evaluated by tenfold cross-validation and compared with those of previously reported prediction tools using an independent dataset. The BChemRF-CPPred satisfactorily identified CPP-like structures using natural and synthetic modified peptide libraries and also obtained better performance than those of previously reported ML-based algorithms, reaching the independent test accuracy of 90.66% (AUC = 0.9365) for PDB, and an accuracy of 86.5% (AUC = 0.9216) for FASTA input. Moreover, our analyses of the CPP chemical space demonstrated that these peptides break some molecular rules related to the prediction of permeability of therapeutic molecules in cell membranes. This is the first comprehensive analysis to predict synthetic and natural CPP structures and to evaluate their chemical space using an ML-based framework. Our algorithm is freely available for academic use at http://comptools.linc.ufpa.br/BChemRF-CPPred .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Boxplot of accuracy from tenfold cross-validation of ANN (red), GPC (blue), SVM (green), and BChemRF-CPPred (orange).

**Figure 2**
(A) Accuracy of ANN (red), GPC (blue), SVM (green), and BChemRF-CPPred (orange) by FCs evaluated in the independent test. (B) ROC curves and AUC of ML-based frameworks using the FC-1, FC-2, FC-3, and FC-4 in the independent test.

**Figure 3**
Boxplot of accuracy from tenfold cross-validation of ANN (red), GPC (blue), SVM (green), and BChemRF-CPPred (orange) using FASTA input.

**Figure 4**
Accuracy of ANN (red), GPC (blue), SVM (green), and BChemRF-CPPred (orange) by FCs evaluated in the independent test, using FASTA input.

**Figure 5**
Normalized cumulative information entropy (CIE) provided by structure-based, AAC, DPC, and PseAAC descriptors, and calculated by ERT algorithm. (A) Training dataset; (B) independent test dataset.

**Figure 6**
Analysis of 3D dimensionality reduction using PCA of the sequence- and structure-based descriptors present in FC-1 to FC-4. Panel (A) 3D PCA of FC-1 showing a contribution of explained variance ratio of 10.93% (PC1), 7.26% (PC2), and 6% (PC3), and cumulative explained variance ratio (CEVR) of 24.19%. (B) 3D PCA of FC-2 showing a contribution of explained variance ratio of 48.9% (PC1), 21.94% (PC2), and 14.34%(PC3), and CEVR = 85.19%. (C) 3D PCA of FC-3 showing a contribution of explained variance ratio of 16.31% (PC1), 12.03% (PC2), and 7.22% (PC3) and CEVR = 35.58%. (D) 3D PCA of FC-4 showing a contribution of explained variance ratio of 17.81% (PC1), 12.48% (PC2), and 8.93% (PC3), and CEVR = 39.29%.

**Figure 7**
General structure of BChemRF-CPPred framework with ANN, GPC and SVM machine learning algorithms.

**Figure 8**
Process of hyper-parameters tuning applied for ANN, GPC, and SVM by FC using Grid Search method. The best models obtained in x-th feature composition (ANN_bFC-X, GPC_bFC-X, SVM_bFC-X) were used to compose the respective framework.

See this image and copyright information in PMC

Cited by

Biological Membrane-Penetrating Peptides: Computational Prediction and Applications.
de Oliveira ECL, da Costa KS, Taube PS, Lima AH, Junior CSS. de Oliveira ECL, et al. Front Cell Infect Microbiol. 2022 Mar 25;12:838259. doi: 10.3389/fcimb.2022.838259. eCollection 2022. Front Cell Infect Microbiol. 2022. PMID: 35402305 Free PMC article. Review.
Research on Plant RNA-Binding Protein Prediction Method Based on Improved Ensemble Learning.
Zhang H, Shi Y, Wang Y, Yang X, Li K, Im SK, Han Y. Zhang H, et al. Biology (Basel). 2025 Jun 10;14(6):672. doi: 10.3390/biology14060672. Biology (Basel). 2025. PMID: 40563923 Free PMC article.
EnDM-CPP: A Multi-view Explainable Framework Based on Deep Learning and Machine Learning for Identifying Cell-Penetrating Peptides with Transformers and Analyzing Sequence Information.
Zhu L, Chen Z, Yang S. Zhu L, et al. Interdiscip Sci. 2025 Sep;17(3):744-769. doi: 10.1007/s12539-024-00673-4. Epub 2024 Dec 23. Interdiscip Sci. 2025. PMID: 39714579
TriplEP-CPP: Algorithm for Predicting the Properties of Peptide Sequences.
Serebrennikova M, Grafskaia E, Maltsev D, Ivanova K, Bashkirov P, Kornilov F, Volynsky P, Efremov R, Bocharov E, Lazarev V. Serebrennikova M, et al. Int J Mol Sci. 2024 Jun 22;25(13):6869. doi: 10.3390/ijms25136869. Int J Mol Sci. 2024. PMID: 38999985 Free PMC article.
Screening for effective cell-penetrating peptides with minimal impact on epithelial cells and gut commensals in vitro.
Gelli HP, Vazquez-Uribe R, Sommer MOA. Gelli HP, et al. Front Pharmacol. 2022 Nov 2;13:1049324. doi: 10.3389/fphar.2022.1049324. eCollection 2022. Front Pharmacol. 2022. PMID: 36408245 Free PMC article.

See all "Cited by" articles

References

1. Henninot A, Collins JC, Nuss JM. The current state of peptide drug discovery: back to the future? J. Med. Chem. 2018;61:1382–1414. doi: 10.1021/acs.jmedchem.7b00318. - DOI - PubMed
1. Díaz-Caballero M, Fernández MR, Navarro S, Ventura S. Prion-based nanomaterials and their emerging applications. Prion. 2018;12:266–272. doi: 10.1080/19336896.2018.1521235. - DOI - PMC - PubMed
1. Li Y, Xiang Q, Zhang Q, Huang Y, Su Z. Overview on the recent study of antimicrobial peptides: origins, functions, relative mechanisms and application. Peptides. 2012;37:207–215. doi: 10.1016/j.peptides.2012.07.001. - DOI - PubMed
1. Greco I, et al. Characterization, mechanism of action and optimization of activity of a novel peptide-peptoid hybrid against bacterial pathogens involved in canine skin infections. Sci. Rep. 2019;9:3679. doi: 10.1038/s41598-019-39042-3. - DOI - PMC - PubMed
1. Topcu E, Biggar KK. PeSA: a software tool for peptide specificity analysis. Comput. Biol. Chem. 2019;83:107145. doi: 10.1016/j.compbiolchem.2019.107145. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

[1] Henninot A, Collins JC, Nuss JM. The current state of peptide drug discovery: back to the future? J. Med. Chem. 2018;61:1382–1414. doi: 10.1021/acs.jmedchem.7b00318. - DOI - PubMed

[2] Henninot A, Collins JC, Nuss JM. The current state of peptide drug discovery: back to the future? J. Med. Chem. 2018;61:1382–1414. doi: 10.1021/acs.jmedchem.7b00318. - DOI - PubMed

[3] Díaz-Caballero M, Fernández MR, Navarro S, Ventura S. Prion-based nanomaterials and their emerging applications. Prion. 2018;12:266–272. doi: 10.1080/19336896.2018.1521235. - DOI - PMC - PubMed

[4] Díaz-Caballero M, Fernández MR, Navarro S, Ventura S. Prion-based nanomaterials and their emerging applications. Prion. 2018;12:266–272. doi: 10.1080/19336896.2018.1521235. - DOI - PMC - PubMed

[5] Li Y, Xiang Q, Zhang Q, Huang Y, Su Z. Overview on the recent study of antimicrobial peptides: origins, functions, relative mechanisms and application. Peptides. 2012;37:207–215. doi: 10.1016/j.peptides.2012.07.001. - DOI - PubMed

[6] Li Y, Xiang Q, Zhang Q, Huang Y, Su Z. Overview on the recent study of antimicrobial peptides: origins, functions, relative mechanisms and application. Peptides. 2012;37:207–215. doi: 10.1016/j.peptides.2012.07.001. - DOI - PubMed

[7] Greco I, et al. Characterization, mechanism of action and optimization of activity of a novel peptide-peptoid hybrid against bacterial pathogens involved in canine skin infections. Sci. Rep. 2019;9:3679. doi: 10.1038/s41598-019-39042-3. - DOI - PMC - PubMed

[8] Greco I, et al. Characterization, mechanism of action and optimization of activity of a novel peptide-peptoid hybrid against bacterial pathogens involved in canine skin infections. Sci. Rep. 2019;9:3679. doi: 10.1038/s41598-019-39042-3. - DOI - PMC - PubMed

[9] Topcu E, Biggar KK. PeSA: a software tool for peptide specificity analysis. Comput. Biol. Chem. 2019;83:107145. doi: 10.1016/j.compbiolchem.2019.107145. - DOI - PubMed

[10] Topcu E, Biggar KK. PeSA: a software tool for peptide specificity analysis. Comput. Biol. Chem. 2019;83:107145. doi: 10.1016/j.compbiolchem.2019.107145. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space

Affiliations

Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources