Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 31;7(1):6862.
doi: 10.1038/s41598-017-07199-4.

PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection

Affiliations

PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection

Jiangning Song et al. Sci Rep. .

Abstract

Protein phosphorylation is a major form of post-translational modification (PTM) that regulates diverse cellular processes. In silico methods for phosphorylation site prediction can provide a useful and complementary strategy for complete phosphoproteome annotation. Here, we present a novel bioinformatics tool, PhosphoPredict, that combines protein sequence and functional features to predict kinase-specific substrates and their associated phosphorylation sites for 12 human kinases and kinase families, including ATM, CDKs, GSK-3, MAPKs, PKA, PKB, PKC, and SRC. To elucidate critical determinants, we identified feature subsets that were most informative and relevant for predicting substrate specificity for each individual kinase family. Extensive benchmarking experiments based on both five-fold cross-validation and independent tests indicated that the performance of PhosphoPredict is competitive with that of several other popular prediction tools, including KinasePhos, PPSP, GPS, and Musite. We found that combining protein functional and sequence features significantly improves phosphorylation site prediction performance across all kinases. Application of PhosphoPredict to the entire human proteome identified 150 to 800 potential phosphorylation substrates for each of the 12 kinases or kinase families. PhosphoPredict significantly extends the bioinformatics portfolio for kinase function analysis and will facilitate high-throughput identification of kinase-specific phosphorylation sites, thereby contributing to both basic and translational research programs.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Workflow of the PhosphoPredict approach. Benchmark training/testing datasets were extracted from the Phospho.ELM database after removing sequence redundancy (70% sequence identity) using the CD-HIT program. After feature selection using mRMR and statistical analysis of over-represented and under-represented feature terms using hypergeometric tests, significant sequence, structural, and functional features were extracted and used as inputs to train RF classifiers. Classifier performance was assessed using randomized 5-fold cross-validation and independent tests.
Figure 2
Figure 2
Protein substrate distributions. The distributions of the known protein substrate set (red) and the background protein set (black) for four common kinase families. The x-axis represents the log-odds ratio score, while the y-axis represents the percentage of proteins with the corresponding scores. Data represent (A) CDKs, (B) MAPKs, (C) PKC, and (D) CK2.
Figure 3
Figure 3
Phosphorylation site prediction. ROC curves for phosphorylation site prediction of three different sequence-encoding schemes: AA (amino acid sequence encoding), AA + SS + SA + DO (amino acid sequence + secondary structure + solvent accessibility + native disorder, without feature selection), and mRMR (mRMR feature selection based on all the extracted initial features), evaluated using 5-fold cross-validation tests on the benchmark datasets. Data represent (A) CDKs, (B) MAPKs, (C) PKC, and (D) CK2.
Figure 4
Figure 4
Comparative phosphorylation site prediction. ROC curves for kinase-specific phosphorylation site prediction between PhosphoPredict and the four currently-available tools, including KinasePhos, PPSP, GPS, and Musite. Data represent (A) CDKs, (B) MAPKs, (C) PKC, and (D) CK2.
Figure 5
Figure 5
Functional enrichment analysis of the predicted substrates of four different kinases at the proteome level, in terms of three major categories, i.e. cellular component (GO_CC), biological process (GO_BP) and molecular function (GO_MF). For each GO category, the top five significantly enriched GO_CC, GO_BP and GO_MF terms are displayed. (A) CDKs; (B) MAPKs; (C) PKC, and (D) CK2.
Figure 6
Figure 6
Example output of the PhosphoPredict Java application. Predicted phosphorylation sites of the cell cycle regulatory protein p95 (Nibrin, Uniprot ID: O60934) by the ATM kinase are displayed.

References

    1. Duan G, Walther D. The roles of post-translational modifications in the context of protein interaction networks. PLoS Comput Biol. 2015;11:e1004049. doi: 10.1371/journal.pcbi.1004049. - DOI - PMC - PubMed
    1. Pinna LA, Ruzzene M. How do protein kinases recognize their substrates? BBA-Mol Cell Res. 1996;1314:191–225. - PubMed
    1. Johnson LN. The regulation of protein phosphorylation. Biochem Soc Trans. 2009;37(Pt 4):627–641. doi: 10.1042/BST0370627. - DOI - PubMed
    1. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinase complement of the human genome. Science. 2002;298:1912–1934. doi: 10.1126/science.1075762. - DOI - PubMed
    1. Sharma K, et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 2014;8:1583–1594. doi: 10.1016/j.celrep.2014.07.036. - DOI - PubMed

Publication types