Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 3;21(6):1510-1524.
doi: 10.1021/acs.jproteome.2c00131. Epub 2022 May 9.

Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation

Affiliations

Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation

Anton Kalyuzhnyy et al. J Proteome Res. .

Abstract

Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence─the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.

Keywords: PeptideAtlas; PhosphoSitePlus; UniProt; database; evolutionary conservation; false discovery rate; mass spectrometry; phosphopeptides; phosphoproteomics; phosphorylation; phosphosites; proteome; proteomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Distribution of serine (Ser), threonine (Thr), and tyrosine (Tyr) phosphosites from UniProt’s reference human proteome that have any positive identification evidence in (A) PhosphoSitePlus (PSP) or (B) PeptideAtlas (PA) based on established phosphorylation likelihood sets (see the Methods section). Venn diagrams provide the counts of (C) Ser, (D) Thr, and (E) Tyr sites ranked “High” in PSP (left), PA (right), and both resources (overlap).
Figure 2
Figure 2
Mean % conservation across 100 eukaryotic species of likely (A) Ser, (B) Thr, and (C) Tyr phosphosites and corresponding likely nonphosphosites within each target protein (n = number of proteins analyzed). The regression coefficient (R2) is given by “R_sq”.
Figure 3
Figure 3
Box plots of conservation percentages (%) across 100 eukaryotic species of human (A) Ser, (B) Thr, and (C) Tyr sites categorized across established phosphorylation confidence sets based on PSP and PA evidence. Within each box, a horizontal line represents median % conservation and (x) symbol represents mean % conservation per group. Each box extends from the 25th to the 75th percentile of each set’s distribution of conservation % values. Vertical lines extending from the boxes correspond to adjacent values. Dots (•) represent outlier values. The red line shows median % conservation in “High in PSP and PA” set for visual comparison.
Figure 4
Figure 4
Counts of proximal amino acids positioned at (A) +1 around Ser; (B) +1 around Thr; (C) +1 around Tyr; (D) −1 around Ser; (E) −1 around Thr; and (F) −1 around Tyr sites of various phosphorylation likelihood based on evidence in PSP and PA, normalized to the observed distribution of those amino acids in human proteome (represented by dotted baseline fixed at 1). Significant (Bonferroni corrected p value <0.001) enrichment of proximal amino acids in the “High in PSP and PA” set is highlighted by the caret symbol (∧) compared with the “Not phosphorylated” set, and an asterisk (*) compared to the expected amino acid distribution.
Figure 5
Figure 5
Top 10 functional categories for which protein sets containing various highest-ranked (A) Ser, (B) Thr, and (C) Tyr sites based on the amount of available phosphorylation evidence (“High in PSP and PA”, “Low in PSP and/or PA”, “No evidence in PSP or PA”) were significantly enriched in DAVID (Benjamini–Hochberg corrected p value <0.05). For each protein set, the % of proteins enriched for a particular functional category is given as well as the log 2(fold enrichment) for that set. The number of proteins in each set is presented by n.
Figure 6
Figure 6
Percentage of proteins within sets containing (A) Ser, (B) Thr, and (C) Tyr sites of various phosphorylation likelihood as their highest-ranked site, annotated with specific UniProt keywords. The number of proteins in each set is presented by n.
Figure 7
Figure 7
Normalized counts of (A) Ser, (B) Thr, and (C) Tyr amino acids of various phosphorylation likelihood based on evidence in PSP and PA, which are found within protein structures (β strand, α helix, turn, coiled coil). Significant (Fisher’s test p value <0.05) enrichment of amino acids from “High in PSP and PA” set within protein structures is highlighted by the dot symbol (•) compared with the “Not phosphorylated” set.

Similar articles

Cited by

References

    1. Cohen P. The role of protein phosphorylation in human health and disease. The Sir Hans Krebs Medal Lecture. Eur. J. Biochem. 2001, 268, 5001–5010. 10.1046/j.0014-2956.2001.02473.x. - DOI - PubMed
    1. Cohen P. The origins of protein phosphorylation. Nat. Cell Biol. 2002, 4, E127–E130. 10.1038/ncb0502-e127. - DOI - PubMed
    1. Goedert M.; Spillantini M.; Cairns N.; et al. Tau proteins of alzheimer paired helical filaments: Abnormal phosphorylation of all six brain isoforms. Neuron 1992, 8, 159–168. 10.1016/0896-6273(92)90117-V. - DOI - PubMed
    1. Amanchy R.; Kalume D. E.; Iwahori A.; et al. Phosphoproteome Analysis of HeLa Cells Using Stable Isotope Labeling with Amino Acids in Cell Culture (SILAC). J. Proteome Res. 2005, 4, 1661–1671. 10.1021/pr050134h. - DOI - PubMed
    1. Nousiainen M.; Silljé H. H. W.; Sauer G.; et al. Phosphoproteome analysis of the human mitotic spindle. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 5391.10.1073/pnas.0507066103. - DOI - PMC - PubMed

Publication types