. 2018 Dec 1;34(23):4007-4016.

doi: 10.1093/bioinformatics/bty451.

ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides

Leyi Wei^{1

2}, Chen Zhou¹, Huangrong Chen¹, Jiangning Song^{3

4}, Ran Su^{5

2}

Affiliations

¹ School of Computer Science and Technology, Tianjin University, Tianjin, China.
² State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China.
³ Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology.
⁴ Monash Centre for Data Science, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia.
⁵ School of Computer Software, Tianjin University, Tianjin, China.

PMID: 29868903
PMCID: PMC6247924
DOI: 10.1093/bioinformatics/bty451

ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides

Leyi Wei et al. Bioinformatics. 2018.

. 2018 Dec 1;34(23):4007-4016.

doi: 10.1093/bioinformatics/bty451.

Authors

Leyi Wei^{1

2}, Chen Zhou¹, Huangrong Chen¹, Jiangning Song^{3

4}, Ran Su^{5

2}

Affiliations

¹ School of Computer Science and Technology, Tianjin University, Tianjin, China.
² State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China.
³ Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology.
⁴ Monash Centre for Data Science, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia.
⁵ School of Computer Software, Tianjin University, Tianjin, China.

PMID: 29868903
PMCID: PMC6247924
DOI: 10.1093/bioinformatics/bty451

Abstract

Motivation: Anti-cancer peptides (ACPs) have recently emerged as promising therapeutic agents for cancer treatment. Due to the avalanche of protein sequence data in the post-genomic era, there is an urgent need to develop automated computational methods to enable fast and accurate identification of novel ACPs within the vast number of candidate proteins and peptides.

Results: To address this, we propose a novel predictor named Anti-Cancer peptide Predictor with Feature representation Learning (ACPred-FL) for accurate prediction of ACPs based on sequence information. More specifically, we develop an effective feature representation learning model, with which we can extract and learn a set of informative features from a pool of support vector machine-based models trained using sequence-based feature descriptors. By doing so, the class label information of data samples is fully utilized. To improve the feature representation, we further employ a two-step feature selection technique, resulting in a most informative five-dimensional feature vector for the final peptide representation. Experimental results show that such five features provide the most discriminative power for identifying ACPs than currently available feature descriptors, highlighting the effectiveness of the proposed feature representation learning approach. The developed ACPred-FL method significantly outperforms state-of-the-art methods.

Availability and implementation: The web-server of ACPred-FL is available at http://server.malab.cn/ACPred-FL.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Flowchart of ACPred-FL. There exist three major steps: Firstly, given protein primary sequences as the input, they are scanned residue by residue using a peptide window with m residues to generate numerous peptides; those peptides that are identical to others will be filtered out; Secondly, the remaining peptides are subjected to the feature representation learning scheme, and each of them is encoded with a five-dimensional feature vector; Thirdly, the resulting feature vectors are fed into a predictive model, which is trained with a SVM classifier on the *ACP500* dataset. Ultimately, the SVM model generates a prediction score for each peptide in the range from 0 to 1. The predictor considers the peptides as potential ACPs if their prediction scores are higher than 0.5, and non-ACPs otherwise

**Fig. 2.**
The proposed feature representation learning scheme. First, peptide sequences are subjected to feature presentation using seven feature descriptors. To incorporate sufficient information, we alter the parameters of the feature descriptors, and then generate 40 feature groups to form the initial feature pool; Second, the resulting feature groups are then fed into well-trained SVM models for predicting the class labels, and finally, the predicted labels (0/1) from the SVM models are concatenated to generate a new feature vector for representation of peptide sequences

**Fig. 3.**
Predictive performance of different feature descriptors on the 10-fold cross-validation and independent tests. (A) The ROC curves illustrating the 10-fold cross-validation performance of three types of feature descriptors. (B) The ROC curves illustrating the independent test performance of the three types of feature descriptors

**Fig. 4.**
Predictive performance of models based on different classifiers: (A) The ROC curve illustrating the 10-fold cross validation performances of the proposed features but with three different classifiers (NB, RF and SVM). (B) The ROC curve illustrating the independent test performances of three types of the proposed features but with three different classifiers (NB, RF and SVM)

**Fig. 5.**
mRMR feature selection of the proposed features. (A) The classification importance scores for the 40 generated features. Note that ‘fea1’ denotes the 1st feature among all the generated features. (B) SFS curve for the predictive model with respect to the ACC and MCC. The x- and y-axis represent the feature number t (ranging from 1 to 40) and the predictive performance, respectively. The blue and orange plots represent the SFS curves of ACC and MCC, respectively (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 6.**
Distribution of the positive and negative samples with respect to different feature descriptors. (A) - (F) represent the distributions of BPF (k=2), GDC (g=3), OPF (k=1), BPF (k=7), CTD, and the proposed features, respectively. 90% of the positive samples (ACPs) and negative samples (non-ACPs) were randomly selected at each sampling step. This sampling procedure was repeated 20 times to obtain sub-sample average feature vectors. On each feature dimension, we calculated the mean and SDs of the feature vectors

**Fig. 7.**
Performance comparison of the proposed ACPred-FL and four state-of-the-art predictors. (A) Ten-fold cross validation results of the proposed ACPred-FL and the existing four predictive models on the *ACP500* dataset. (B) ROC curves of the proposed ACPred-FL and existing four predictive models on the ACP500 dataset. (C) Independent test results of the proposed ACPred-FL and existing four predictive models on the ACP164 dataset. (D) ROC curves of the proposed ACPred-FL and existing four predictive models on the *ACP164* dataset

**Fig. 8.**
Performance comparison of the proposed ACPred-FL and four state-of-the-art predictors on Tyagi’s dataset. (A) Ten-fold cross validation results of the proposed ACPred-FL and the existing four predictive models on the training set of Tyagi’s dataset. (B) ROC curves of the proposed ACPred-FL and existing four predictive models on the training set of Tyagi’s dataset. (C) Independent test results of the proposed ACPred-FL and existing four predictive models on the training set of Tyagi’s dataset. (D) ROC curves of the proposed ACPred-FL and existing four predictive models on the testing set of Tyagi’s dataset

See this image and copyright information in PMC

Cited by

In vitro anti-gastrointestinal cancer activity of Toxocara canis -derived peptide: Analyzing the expression level of factors related to cell proliferation and tumor growth.
Bahadory S, Sadraei J, Zibaei M, Pirestani M, Dalimi A. Bahadory S, et al. Front Pharmacol. 2022 Sep 20;13:878724. doi: 10.3389/fphar.2022.878724. eCollection 2022. Front Pharmacol. 2022. PMID: 36204226 Free PMC article.
Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework.
Charoenkwan P, Schaduangrat N, Lio' P, Moni MA, Shoombuatong W, Manavalan B. Charoenkwan P, et al. iScience. 2022 Aug 5;25(9):104883. doi: 10.1016/j.isci.2022.104883. eCollection 2022 Sep 16. iScience. 2022. PMID: 36046193 Free PMC article.
Integrating In Silico and In Vitro Approaches to Identify Natural Peptides with Selective Cytotoxicity against Cancer Cells.
Kao HJ, Weng TH, Chen CH, Chen YC, Chi YH, Huang KY, Weng SL. Kao HJ, et al. Int J Mol Sci. 2024 Jun 21;25(13):6848. doi: 10.3390/ijms25136848. Int J Mol Sci. 2024. PMID: 38999958 Free PMC article.
Accurately identifying hemagglutinin using sequence information and machine learning methods.
Zou X, Ren L, Cai P, Zhang Y, Ding H, Deng K, Yu X, Lin H, Huang C. Zou X, et al. Front Med (Lausanne). 2023 Oct 31;10:1281880. doi: 10.3389/fmed.2023.1281880. eCollection 2023. Front Med (Lausanne). 2023. PMID: 38020152 Free PMC article.
A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol.
Cai J, Wang D, Chen R, Niu Y, Ye X, Su R, Xiao G, Wei L. Cai J, et al. Front Bioeng Biotechnol. 2020 Jun 4;8:502. doi: 10.3389/fbioe.2020.00502. eCollection 2020. Front Bioeng Biotechnol. 2020. PMID: 32582654 Free PMC article.

See all "Cited by" articles

References

1. Barras D., Widmann C. (2011) Promises of apoptosis-inducing peptides in cancer therapeutics. Curr. Pharm. Biotechnol. ,12, 1153–1165. - PubMed
1. Boohaker R.J., et al. (2012) The use of therapeutic peptides to target and to kill cancer cells. Curr. Med. Chem. ,19, 3794. - PMC - PubMed
1. Chen W., et al. (2016) iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget ,7, 16895. - PMC - PubMed
1. Diana G., et al. (2013) From antimicrobial to anticancer peptides. A review. Front. Microbiol. ,4, 294. - PMC - PubMed
1. Ding C., Peng H. (2003) Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol., 3, 185–205. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R01 AI111965/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides

Affiliations

ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources