Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 17;23(1):bbab510.
doi: 10.1093/bib/bbab510.

PhosPiR: an automated phosphoproteomic pipeline in R

Affiliations

PhosPiR: an automated phosphoproteomic pipeline in R

Ye Hong et al. Brief Bioinform. .

Erratum in

Abstract

Large-scale phosphoproteome profiling using mass spectrometry (MS) provides functional insight that is crucial for disease biology and drug discovery. However, extracting biological understanding from these data is an arduous task requiring multiple analysis platforms that are not adapted for automated high-dimensional data analysis. Here, we introduce an integrated pipeline that combines several R packages to extract high-level biological understanding from large-scale phosphoproteomic data by seamless integration with existing databases and knowledge resources. In a single run, PhosPiR provides data clean-up, fast data overview, multiple statistical testing, differential expression analysis, phosphosite annotation and translation across species, multilevel enrichment analyses, proteome-wide kinase activity and substrate mapping and network hub analysis. Data output includes graphical formats such as heatmap, box-, volcano- and circos-plots. This resource is designed to assist proteome-wide data mining of pathophysiological mechanism without a need for programming knowledge.

Keywords: bioinformatics; data visualization; phosphoproteomics; pipeline; proteomics; statistics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart overview of pipeline architecture. The main pipeline steps are outlined. Information on software packages utilized and/or information on the approach used is provided in adjoining boxes.
Figure 2
Figure 2
Sample output from the statistical and enrichment analysis feature of PhosPiR. Phosphoproteome changes in synaptoneurosomes of sleep-deprived versus normal sleep cycle mice. (A) Normalized phosphoproteomic data were loaded to PhosPiR. Automated statistical analysis was done on user-defined group comparisons for up to four statistical tests and visualized as volcano plots and csv files. A representative volcano plot is shown for the rank product FDR statistical analysis output. Significant proteins are labeled in the volcano plots only when there are ≤60 significant datapoints, otherwise the labels overlap. In this example the number of significant hits are >60. Every volcano plot is accompanied by a csv file providing detailed numerical output from all statistical tests, including UniProt and gene accession identifiers. (B) PhosPiR performs several enrichment analyses on the data, e.g. GO, cell marker and KEGG enrichment analyses. The KEGG analysis output is shown from the comparison between wake and sleep time synaptoneurosomes phosphopoteome, as an example. (C) Phosphosite enrichment using the post-translational modification set enrichment analysis (PTM-SEA) compares synaptoneurosomes from sleep-deprived mice and control mice during wake hours. Enrichment P-values and FDR (adjusted P-value) are indicated. This analysis highlights synaptic upregulation of mTOR pathway phosphorylation in sleep-deprived mice. Information on specific proteins and regulated sites are found in the accompanying csv file in the Enrichment\PhosphoSite enrichment folder.
Figure 3
Figure 3
PhosPiR utilizes the KinSwingR tool to predict increases or decreases in kinase activities for defined group comparisons. PhosPiR integrates the KinSwingR tool which assesses local connectivity (swing) of kinase–substrate networks. Automated output from PhosphoPiR kinase analysis predicts regulated kinase activity based on identified substrate motifs. The final swing score is a normalized and weighted score of predicted kinase activity. Swing scores, positive and negative represent the direction of kinase activity change. Representative output tiffs are shown and accompanying csv file (ComparisonX_swingscore) is found in the Kinase analysis output folder.
Figure 4
Figure 4
Kinase substrate network prediction tool. PhosPiR performs a proteome-wide kinase analysis using the KinSwingR package as shown (Figure 3). The PhosPiR Network Analysis tool finds the top kinase–substrate relations and presents them in a Circos plot. (A and B) Predicted kinase–substrate connections from the significantly changing data for group comparisons (A) wake hours versus sleep hours from control mice and (B) sleep deprived versus control mice during wake hours are shown. Colored ribbons link the kinase of interest with the substrate phosphorylation site that is significantly changed in the comparison. Predictions rely on known kinase–substrate phosphorylation sites. Only the top 250 most significant kinase–substrate relationships are plotted, to facilitate labeling. All output data are available in the accompanying csv file ComparisonX_significant_kinaseNetwork.
Figure 5
Figure 5
PhosPiR identifies network hubs based on protein:protein interactions. Sample output from the Network Analysis tool hub analysis is shown in (A) and (B). Hubs are defined as proteins with interaction number > 1 SD from the mean. Hub significance is calculated from the number of interactions within the data set compared to 1000 equal sized background datasets randomly generated from the total data. The hub interaction count in the background dataset is shown as a boxplot, and interaction count (hubness) in the target network is indicated by a red star. FDR values calculated from the permutation test are indicated above the boxplots. Hubs from comparisons of sleep deprived versus control mice during wake hours, and wake hours versus sleep hours from sleep-deprived mice are shown in (A) and (B), respectively.

References

    1. Fischer EH. Cellular regulation by protein phosphorylation. Biochem Biophys Res Commun 2013;430:865–7. - PubMed
    1. Cohen P. The origins of protein phosphorylation. Nat Cell Biol 2002;4:E127–30. - PubMed
    1. Jouy F, Müller SA, Wagner J, et al. Integration of conventional quantitative and phospho-proteomics reveals new elements in activated Jurkat T-cell receptor pathway maintenance. Proteomics 2015;15:25–33. - PubMed
    1. Francavilla C, Papetti M, Rigbolt KTG, et al. Multilayered proteomics reveals molecular switches dictating ligand-dependent EGFR trafficking. Nat Struct Mol Biol 2016;23:608–18. - PubMed
    1. Robles MS, Humphrey SJ, Mann M. Phosphorylation is a central mechanism for circadian control of metabolism and physiology. Cell Metab 2017;25:118–27. - PubMed

Publication types