Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug;34(8):841-856.
doi: 10.1007/s10822-020-00307-z. Epub 2020 Mar 16.

TargetCPP: accurate prediction of cell-penetrating peptides from optimized multi-scale features using gradient boost decision tree

Affiliations

TargetCPP: accurate prediction of cell-penetrating peptides from optimized multi-scale features using gradient boost decision tree

Muhammad Arif et al. J Comput Aided Mol Des. 2020 Aug.

Abstract

Cell-penetrating peptides (CPPs) are short length permeable proteins have emerged as drugs delivery tool of therapeutic agents including genetic materials and macromolecules into cells. Recently, CPP has become a hotspot avenue for life science research and paved a new way of disease treatment without harmful impact on cell viability due to nontoxic characteristic. Therefore, the correct identification of CPPs will provide hints for medical applications. Considering the shortcomings of traditional experimental CPPs identification, it is urgently needed to design intelligent predictor for accurate identification of CPPs for the large scale uncharacterized sequences. We develop a novel computational method, called TargetCPP, to discriminate CPPs from Non-CPPs with improved accuracy. In TargetCPP, first the peptide sequences are formulated with four distinct encoding methods i.e., composite protein sequence representation, composition transition and distribution, split amino acid composition, and information theory features. These dominant feature vectors were fused and applied intelligent minimum redundancy and maximum relevancy feature selection method to choose an optimal subset of features. Finally, the predictive model is learned through different classification algorithms on the optimized features. Among these classifiers, gradient boost decision tree algorithm achieved excellent performance throughout the experiments. Notably, the TargetCPP tool attained high prediction Accuracy of 93.54% and 88.28% using jackknife and independent test, respectively. Empirical outcomes prove the superiority and potency of proposed bioinformatics method over state-of-the-art methods. It is highly anticipated that the outcomes of this study will provide a strong background for large scale prediction of CPPs and instructive guidance in clinical therapy and medical applications.

Keywords: Cell-penetrating peptides; Composite protein sequence representation; Composition transition and distribution; Gradient boost; Split amino acid composition.

PubMed Disclaimer

Publication types

MeSH terms

Substances

LinkOut - more resources