Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 3;17(8):2715-2726.
doi: 10.1021/acs.jproteome.8b00148. Epub 2018 Jul 2.

Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy

Affiliations

Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy

Balachandran Manavalan et al. J Proteome Res. .

Abstract

Cell-penetrating peptides (CPPs) can enter cells as a variety of biologically active conjugates and have various biomedical applications. To offset the cost and effort of designing novel CPPs in laboratories, computational methods are necessitated to identify candidate CPPs before in vitro experimental studies. We developed a two-layer prediction framework called machine-learning-based prediction of cell-penetrating peptides (MLCPPs). The first-layer predicts whether a given peptide is a CPP or non-CPP, whereas the second-layer predicts the uptake efficiency of the predicted CPPs. To construct a two-layer prediction framework, we employed four different machine-learning methods and five different compositions including amino acid composition (AAC), dipeptide composition, amino acid index, composition-transition-distribution, and physicochemical properties (PCPs). In the first layer, hybrid features (combination of AAC and PCP) and extremely randomized tree outperformed state-of-the-art predictors in CPP prediction with an accuracy of 0.896 when tested on independent data sets, whereas in the second layer, hybrid features obtained through feature selection protocol and random forest produced an accuracy of 0.725 that is better than state-of-the-art predictors. We anticipate that our method MLCPP will become a valuable tool for predicting CPPs and their uptake efficiency and might facilitate hypothesis-driven experimental design. The MLCPP server interface along with the benchmarking and independent data sets are freely accessible at www.thegleelab.org/MLCPP .

Keywords: cell-penetrating peptides; extremely randomized tree; feature selection; machine learning; random forest; uptake efficiency.

PubMed Disclaimer

Similar articles

Cited by

Publication types

LinkOut - more resources