. 2019 Feb 27;15(2):e1006678.

doi: 10.1371/journal.pcbi.1006678. eCollection 2019 Feb.

CoPhosK: A method for comprehensive kinase substrate annotation using co-phosphorylation analysis

Marzieh Ayati^{1

2}, Danica Wiredja³, Daniela Schlatzer³, Sean Maxwell³, Ming Li^{3

4

5}, Mehmet Koyutürk^{1

3

5}, Mark R Chance^{3

5

6}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH.
² Department of Computer Science, University of Texas Rio Grande Valley, Edinburg, TX.
³ Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH.
⁴ Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH.
⁵ Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH.
⁶ Department of Nutrition, Case Western Reserve University, Cleveland, OH.

PMID: 30811403
PMCID: PMC6411229
DOI: 10.1371/journal.pcbi.1006678

CoPhosK: A method for comprehensive kinase substrate annotation using co-phosphorylation analysis

Marzieh Ayati et al. PLoS Comput Biol. 2019.

. 2019 Feb 27;15(2):e1006678.

doi: 10.1371/journal.pcbi.1006678. eCollection 2019 Feb.

Authors

Marzieh Ayati^{1

2}, Danica Wiredja³, Daniela Schlatzer³, Sean Maxwell³, Ming Li^{3

4

5}, Mehmet Koyutürk^{1

3

5}, Mark R Chance^{3

5

6}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH.
² Department of Computer Science, University of Texas Rio Grande Valley, Edinburg, TX.
³ Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH.
⁴ Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH.
⁵ Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH.
⁶ Department of Nutrition, Case Western Reserve University, Cleveland, OH.

PMID: 30811403
PMCID: PMC6411229
DOI: 10.1371/journal.pcbi.1006678

Abstract

We present CoPhosK to predict kinase-substrate associations for phosphopeptide substrates detected by mass spectrometry (MS). The tool utilizes a Naïve Bayes framework with priors of known kinase-substrate associations (KSAs) to generate its predictions. Through the mining of MS data for the collective dynamic signatures of the kinases' substrates revealed by correlation analysis of phosphopeptide intensity data, the tool infers KSAs in the data for the considerable body of substrates lacking such annotations. We benchmarked the tool against existing approaches for predicting KSAs that rely on static information (e.g. sequences, structures and interactions) using publically available MS data, including breast, colon, and ovarian cancer models. The benchmarking reveals that co-phosphorylation analysis can significantly improve prediction performance when static information is available (about 35% of sites) while providing reliable predictions for the remainder, thus tripling the KSAs available from the experimental MS data providing to a comprehensive and reliable characterization of the landscape of kinase-substrate interactions well beyond current limitations.

PubMed Disclaimer

Conflict of interest statement

I have read the journal's policy and the authors of this manuscript have the following competing interests: The patent application is pending from Case Western Reserve University. The patent is about the methodology to identify enzyme-substrate association using co-substrate analysis.

Figures

**Fig 1. The overview of CophosK and CophosK+.**
After identification of phosphorylated sites, KinomeXplorer utilizes sequence match scoring and network proximity of kinases and substrates to predict KSAs. CophosK constructs the co-phosphorylation network in order to infer KSAs. CophosK+ combines all the scores to provide more accurate KSA predictions. PhosphoSitePlus annotates ~6% of the identified phosphosites by their associated kinases. KinomeXplorer improves the coverage of annotations to 35% and can improve accuracy of predictions in the context of co-phosphorylation information (CoPhosK+). On the other hand, CophosK alone is able to annotate 100% of the identified phosphosites in the experiment.

**Fig 2. Distribution of co-phosphorylation among pairs of phosphosites on breast cancer PDX.**
The blue histogram shows the distribution of co-phosphorylation (the correlation between the phosphorylation levels) of all pairs of phosphosites in breast cancer PDX (μ = 0.01,σ = 0.22). (a) Illustration of the three different permutations tests that were used to assess the significance of this distribution. The pink histogram in each panel shows the distribution of co-phosphorylation of all pairs of phosphosites in 100 permutation representing (b) randomization of all entries in the phosphorylation matrix (μ = 0.008,σ = 0.20), (c) permutation of all entries across phosphosites for each state (μ = 0.01,σ = 0.20), and (d) permutation of all entries across states within each phosphosite (μ = 0.01,σ = 0.20). The distribution of co-phosphorylation in the original dataset is significantly broader as compared to the distribution of co-phosphorylation in all permutations (Kolmogorov-Smirnov (KS) test p-value << 1E-9).

**Fig 3. Distribution of co-phosphorylation between phosphosites that are substrates of the same kinase.**
(a) For each of 347 reported kinases, we compute the co-phosphorylation of all pairs of phosphosites that are reported to be common substrates of that kinase in PhosphositePLUS (*shared-kinase pairs*). In panels (b), (c), the green histogram shows the distribution of co-phosporylation for all *shared-kinase pairs* and the blue histogram shows the distribution of co-phosphorylation for all pairs of phosphosites in the dataset. (b) Breast Cancer PDX dataset (37234 shared kinase pairs; μ = 0.05,σ = 0.23,kutosis = 2.85, skewness = 0.10), (c) Ovarian Cancer tumors (8235 shared kinase pairs; μ = 0.11,σ = 0.32,kutosis = 2.54, skewness = -0.10). For both datasets, the distribution for shared-kinase pairs is significantly wider and shifted to the right as compared to the distribution for all phosphosite pairs (KS-test p-value << 1E-9).

**Fig 4. Workflow of CophosK for using co-phosphorylation to predict kinase-substrate associations (KSAs).**
The method takes as input available information on KSAs and phosphorylation data representing the phosphorylation levels of thousands of phosphosites across multiple biological states. The co-phosphorylation of all pairs of phosphosites is assessed and a co-phosphorylation network is constructed in which phosphosites represent nodes and the weight of the edges is the correlation between phosphosites. CophosK then uses a Naïve Bayes classifier that integrates the interactions in this network with partial information on kinase-substrate interactions to predict new kinase substrate interaction.

**Fig 5. The correspondence between the predictions of CophosK vs. KinomeXplorer in ranking kinases for each phosphosite.**
For all kinase-substrate associations reported in PhosphoSitePlus for which we can detect a substrate in the LC/MS data, we perform leave-one-out cross validation by hiding the association between the phosphosite and kinase and using CophosK to utilize other kinase-substrate associations and co-phosphorylation to rank the likely kinases for the phosphosite. (a) shows the comparison of the rankings provided by CoPhosK (x-axis) against the rankings provided by KinomeXplorer (y-axis) for breast cancer PDX data (740 predictions) and ovarian cancer (313 predictions). In (b), the box plot distribution of the rank of the target kinase according to the prediction of two methods are presented.

**Fig 6. Performance of CophosK, KinomeXplorer and CophosK+ in predicting kinases for phosphosites.**
For each dataset, we consider all phosphosites that are identified in the dataset and/or reported in PhosphoSitePlus. For each phosphosite, we perform leave-one-out cross validation by hiding the association between the phosphosite and one of its associated kinases (*target kinase*) to rank the likely kinases for the phosphosite using PUEL, CophosK, KinomeXplorer, and CophosK⁺. We report the fraction of phosphosites for which the target kinase is ranked in the top 1 and top 5 predicted kinases by each method (as indicated by different colors in the bar plot). Each panel shows the performance of the methods on (a) breast cancer PDX data(I), (b) ovarian cancer(II), (c) breast cancer(III), (d) breast cancer(IV), (e) ovarian cancer(V), and colorectal cancer(VI) datasets.

**Fig 7. Consistency and reproducibility of kinase-substrate predictions made using different phosphorylation data sets.**
For each phosphosite in each data set, we rank the kinases using CophosK⁺, and then we identify the top ranked kinase based on two different datasets. **(a)** The Venn diagrams show (1) the number of phosphosites for which top-ranked kinase agrees with that reported in PhosphoSitePlus (True Positive), (2) the number of phosphosites that have kinases reported in PhosphoSitePlus but top-ranked kinase does not agree with that reported in PhosphoSitePlus (False Positive), and (3) the number of phosphosites with no kinase annotation reported in PhosphoSitePlus. The blue circles represent predictions based on ovarian cancer tumor cell lines, pink circles represents predictions based on breast cancer PDX. For the phosphosites for which at least one kinase is listed in PhosphoSitePlus, the right panel shows the “precision” (True Positive / (True Positive + False Positive)) of the top ranked kinase for each individual dataset and the intersection between the two datasets (i.e., the same kinase is ranked top in both datasets). **(b)** The number of phosphosites with at least one annotation in PhosphoSitePlus (upper panel) and no annotation in PhosphoSitePlus (lower panel) are shown as a function of the number of datasets that contain the phosphosite. Among these phosphosites, the number of phosphosites for which the top ranked kinase is identical across multiple datasets (identical predicted KSAs) is also shown as a function of the number of supporting datasets. For the phosphosites with annotations, the number of predictions that are consistent with PhosphositePlus annotations are also shown (true identical KSAs).

See this image and copyright information in PMC

References

1. Huttlin E. L., et al. (2010) "A tissue-specific atlas of mouse protein phosphorylation and expression", Cell 143.7, 1174–1189. - PMC - PubMed
1. Wisniewski J.R., et al. (2010) "Brain phosphoproteome obtained by a FASP-based method reveals plasma membrane protein topology", Journal of proteome research 9.6, 3280–3289. - PubMed
1. Knight JDR., et al. (2013) "Profiling the kinome: current capabilities and future challenges." Journal of proteomics 81, 43–55. 10.1016/j.jprot.2012.10.015 - DOI - PubMed
1. Müller André C., et al. (2016) "Identifying kinase substrates via a heavy ATP kinase assay and quantitative mass spectrometry", Scientific reports 6, 28107 10.1038/srep28107 - DOI - PMC - PubMed
1. Diella F. et al. (2004) "Phospho. ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins", BMC bioinformatics 5.1, 79. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CoPhosK: A method for comprehensive kinase substrate annotation using co-phosphorylation analysis

Affiliations

CoPhosK: A method for comprehensive kinase substrate annotation using co-phosphorylation analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases