Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar;18(3):576-593.
doi: 10.1074/mcp.TIR118.000943. Epub 2018 Dec 18.

A Curated Resource for Phosphosite-specific Signature Analysis

Affiliations

A Curated Resource for Phosphosite-specific Signature Analysis

Karsten Krug et al. Mol Cell Proteomics. 2019 Mar.

Abstract

Signaling pathways are orchestrated by post-translational modifications (PTMs) such as phosphorylation. However, pathway analysis of PTM data sets generated by mass spectrometry (MS)-based proteomics is typically performed at a gene-centric level because of the lack of appropriately curated PTM signature databases and bioinformatic tools that leverage PTM site-specific information. Here we present the first version of PTMsigDB, a database of modification site-specific signatures of perturbations, kinase activities and signaling pathways curated from more than 2,500 publications. We adapted the widely used single sample Gene Set Enrichment Analysis approach to utilize PTMsigDB, enabling PTMSignature Enrichment Analysis (PTM-SEA) of quantitative MS data. We used a well-characterized data set of epidermal growth factor (EGF)-perturbed cancer cells to evaluate our approach and demonstrated better representation of signaling events compared with gene-centric methods. We then applied PTM-SEA to analyze the phosphoproteomes of cancer cells treated with cell-cycle inhibitors and detected mechanism-of-action specific signatures of cell cycle kinases. We also applied our methods to analyze the phosphoproteomes of PI3K-inhibited human breast cancer cells and detected signatures of compounds inhibiting PI3K as well as targets downstream of PI3K (AKT, MAPK/ERK) covering a substantial fraction of the PI3K pathway. PTMsigDB and PTM-SEA can be freely accessed at https://github.com/broadinstitute/ssGSEA2.0.

Keywords: Computational Biology; Database design; Pathway Analysis; Phosphorylation; Post-translational modifications*.

PubMed Disclaimer

Conflict of interest statement

The authors do not declare any conflict of interest

Figures

None
Graphical abstract
Fig. 1.
Fig. 1.
Pathway analysis of phosphoproteome data sets. A, The majority of proteins have multiple phospho-acceptor sites that can be differentially occupied and vary in abundance levels measured by MS. The four examples illustrate three proteoforms with varying number of phosphorylation sites (represented by amino acid residue and position in protein sequence) which are quantified with different fold changes (represented as numbers in parenthesis). Protein A exemplifies the presence of two protein isoforms carrying an isoform-specific phosphorylation site. B, Gene-centric pathway analysis typically involves combining fold changes of multiple sites mapping to the same gene symbol by calculating a center value of abundance (mean or median) or by choosing a single site characterized by high degree of variance (thus information) across a sample cohort. Additional information provided by multiple phosphorylation sites on a single protein as well as different sites on protein isoforms are not considered. Resulting gene-centric expression matrix can then be queried against gene-centric pathway databases such as Reactome, KEGG or MSigDB, in which each pathway is represented as a collection of gene symbols. C, Site-centric pathway analysis takes all quantified phosphorylation sites into account and requires a database of pathways annotated using individual phosphorylation sites. For that purpose, we developed PTMsigDB, annotating signatures of pathways, kinases and perturbations at the level of sites. Each site is annotated with the direction of regulation in a signature, i.e. whether its abundance is decreased or elevated, exemplified by blue and red arrows, respectively. Quantitative, site-centric phosphoproteomic data can be directly queried against PTMsigDB to identify signatures of phosphosites that correlate with annotated signature sets in PTMsigDB.
Fig. 2.
Fig. 2.
PTMsigDB signature sets and scoring scheme. Signatures sets in PTMsigDB are divided into three categories (perturbations, kinases and signatures of molecular pathways) derived from four major sources (PhosphoSitePlus, NetPath, WikiPathways, LINCS). A, Perturbation signatures from PhosphoSitePlus were assembled from a total of 2483 publications. To reduce experimental noise we require that each site be consistently reported by at least two independent studies (consensus signatures). Experimental noise can be introduced by use of different cell systems (e.g. in vivo versus in vitro), different protocols (e.g. dosage of perturbation) or different technologies (e.g. low-throughput versus high throughput). B, Bar chart depicting the distribution of unidirectional (all sites of a signature are annotated with decreased OR increased abundance levels) and bidirectional (sites of a signature include both decreased AND increased abundance levels) signature sets. C, Modified scoring scheme of the single sample Gene Set Enrichment Analysis (ssGSEA) algorithm enables scoring of directional signature sets. For each direction a running-sum statistic (y axis) is calculated by walking down the ranked list of PTM sites in the data set (x axis). An enrichment score (ES) reflected by the area under the resulting curve is calculated for each direction (ESu for “up” sites and ESd for “down” sites) separately and combined by subtracting ESd from ESu. D, Heatmap depicting the overlap between phosphorylation sites commonly affected by kinase-inhibitors. The similarity matrix is based on the number of shared phosphosites between signatures. The upper and lower triangular matrices are normalized by the total number of sites of signatures listed in rows and columns, respectively.
Fig. 3.
Fig. 3.
Phosphoproteome signatures of EGF and nocodazole treatment in HeLa cells. We applied PTM-SEA to a phosphoproteome data set from the literature. Sharma et al. (18) studied phosphoproteome dynamics of EGF treatment in HeLa S3 cells. To mitotically arrest the cells, the antineoplastic agent nocodazole was used. Because PTMsigDB contained signatures for both perturbations, this data set was used as a benchmark to evaluate PTM-SEA. A, Heatmap depicting normalized enrichment scores (NES) of signatures (rows) in PTMsigDB that were consistently enriched or depleted at FDR < 0.01 in replicate measurements of EGF/nocodazole treatments (marked by asterisks). Hierarchical clustering of enrichment scores separated samples (columns) by experimental condition (EGF, nocodazole and control). Enrichment scores of nocodazole and EGF signature showed highest magnitude in the respective experiments and are highlighted in red. B, Silhouette analysis comparing hierarchical clustering of site-centric (left panel) and gene-centric (right panel) signature enrichment scores into three clusters. The bar chart depicts silhouette scores (x axis) of each sample (y axis) colored according to the assigned cluster. Average silhouette scores for each cluster are depicted at the right side of the bar charts. C, Signature enrichment scores of the nocodazole perturbation signature calculated using site-centric (left panel) and gene-centric (right panel) approach. Numbers indicate the median NES and FDR in replicate measurements of respective perturbations (x axis).
Fig. 4.
Fig. 4.
MoA-specific kinase signatures of cell cycle inhibitors. Heatmaps depicting perturbation and kinase signature enrichment scores of three human cancer cell lines (MCF7, HL60, PC3) treated with four different cell cycle inhibitors measured in duplicates. Asterisks indicate signature scores at FDR < 0.05. Cartoons above the heatmap graphically illustrate the increasing levels (blue to red) of CDK1/2 substrate phosphorylations. The density plot illustrates cyclin A/B/E concentrations across cell cycle stages that correlate with CDK1/2 activity levels. Black T-arrows indicate stage of cell cycle the corresponding inhibitor is active. Signatures of CDK1 and CDK2 activity strongly correlate with the mode of action of each compound.
Fig. 5.
Fig. 5.
Signatures of PI3K inhibition in breast cancer cells. T47D cells were treated with the PI3Ka inhibitor BYL719 and DMSO as control for 6h and 24h and deep phosphoproteomes were acquired resulting in ∼24K phosphosites localized to a specific Ser/Thr/Tyr residue. A, Volcano plot depicting enrichment of phosphoproteome signatures in the 6h time point. The x axis represents the normalized enrichment score (NES) between DMSO (left side) and drug treatment (right side). The size of the dots scale with the relative number of scored phosphorylation sites in a signature. The gray area contains signatures that did not significantly change upon drug treatment (permutation-based FDR ≥ 5%). PTM signature sets of inhibitors are annotated with an “i” after the drug target. B, Volcano plot depicting phospho signatures after treatment for 24 h. C, Schematic representation of the PI3K-AKT-mTOR and Ras-Raf-MEK-ERK pathways. Circles indicate kinases, boxes indicate specific kinase inhibitors. Highlighted in colors are significant phospho signatures shown in the volcano plots in A and B. The pathway representation is based on Fig. 1 in (44).

References

    1. Mertins P., Mani D. R., Ruggles K. V., Gillette M. A., Clauser K. R., Wang P., Wang X., Qiao J. W., Cao S., Petralia F., Kawaler E., Mundt F., Krug K., Tu Z., Lei J. T., Gatza M. L., Wilkerson M., Perou C. M., Yellapantula V., Huang K.-L., Lin C., McLellan M. D., Yan P., Davies S. R., Townsend R. R., Skates S. J., Wang J., Zhang B., Kinsinger C. R., Mesri M., Rodriguez H., Ding L., Paulovich A. G., Fenyö D., Ellis M. J., Carr S. A., and NCI CPTAC. (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 - PMC - PubMed
    1. Zhang H., Liu T., Zhang Z., Payne S. H., Zhang B., McDermott J. E., Zhou J. Y., Petyuk V. A., Chen L., Ray D., Sun S., Yang F., Chen L., Wang J., Shah P., Cha S. W., Aiyetan P., Woo S., Tian Y., Gritsenko M. A., Clauss T. R., Choi C., Monroe M. E., Thomas S., Nie S., Wu C., Moore R. J., Yu K. H., Tabb D. L., Fenyö D., Vineet V., Wang Y., Rodriguez H., Boja E. S., Hiltke T., Rivers R. C., Sokoll L., Zhu H., Shih I. M., Cope L., Pandey A., Zhang B., Snyder M. P., Levine D. A., Smith R. D., Chan D. W., Rodland K. D., Carr S. A., Gillette M. A., Klauser K. R., Kuhn E., Mani D. R., Mertins P., Ketchum K. A., Thangudu R., Cai S., Oberti M., Paulovich A. G., Whiteaker J. R., Edwards N. J., McGarvey P. B., Madhavan S., Wang P., Chan D. W., Pandey A., Shih I. M., Zhang H., Zhang Z., Zhu H., Cope L., Whiteley G. A., Skates S. J., White F. M., Levine D. A., Boja E. S., Kinsinger C. R., Hiltke T., Mesri M., Rivers R. C., Rodriguez H., Shaw K. M., Stein S. E., Fenyo D., Liu T., McDermott J. E., Payne S. H., Rodland K. D., Smith R. D., Rudnick P., Snyder M., Zhao Y., Chen X., Ransohoff D. F., Hoofnagle A. N., Liebler D. C., Sanders M. E., Shi Z., Slebos R. J. C., Tabb D. L., Zhang B., Zimmerman L. J., Wang Y., Davies S. R., Ding L., Ellis M. J. C., and Townsend R. R. (2016) Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer. Cell 166, 755–765 - PMC - PubMed
    1. Kanehisa M., and Goto S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 - PMC - PubMed
    1. Croft D., Mundo A. F., Haw R., Milacic M., Weiser J., Wu G., Caudy M., Garapati P., Gillespie M., Kamdar M. R., Jassal B., Jupe S., Matthews L., May B., Palatnik S., Rothfels K., Shamovsky V., Song H., Williams M., Birney E., Hermjakob H., Stein L., and D'Eustachio P. (2014) The Reactome pathway knowledgebase. Nucleic Acids Res. 42, D472–D477 - PMC - PubMed
    1. Liberzon A., Subramanian A., Pinchback R., Thorvaldsdottir H., Tamayo P., and Mesirov J. P. (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 - PMC - PubMed

Publication types

Substances

LinkOut - more resources