Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2017 Feb 1;6(2):1-11.
doi: 10.1093/gigascience/giw015.

Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes

Affiliations
Meta-Analysis

Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes

Panayotis Vlastaridis et al. Gigascience. .

Abstract

Background: Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast).

Results: In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable.

Conclusions: Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70 % to 95 % of total phosphoproteins, but only 40 % to 60 % of total p-sites.

Keywords: Arabidopsis; Capture-Recapture; Curve-Fitting; Phosphoproteomics; human; mouse; total number of phosphoproteins; total number of phosphorylation sites; yeast.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Estimation of the total number of phosphoproteins (1A, 1B) and p-sites (1C, 1D) for yeast, with the curve-fitting (assuming 1 % noise) and Capture-Recapture methods, also correcting for three levels of noise (1 %, 5 %, 10 %). In Fig. 1A and C, the x-axis is the cumulative number of redundant phosphoproteins/p-sites, whereas the y-axis is the cumulative number of non-redundant phosphoproteins/p-sites. The red curve is fitted for 1 % noise. In Fig. 1B and D: Current is the total number of phosphoproteins/p-sites detected so far (by applying our filtering criteria). Current_3X is the total number of phosphoproteins/p-sites detected so far in at least three experiments. Rcapture is the estimation of maximum number of phosphoproteins/p-sites based on the Rcapture method (using the 15 largest datasets). Rcapture_HTP_vs_LTP is the estimation of maximum number of phosphoproteins/p-sites based on the Rcapture method, but this time using only two datasets, where one of them is the compendium of all HTP experiments and the second is the compendium of all LTP experiments from PhosphoGrid2. CF is the estimation of maximum number of phosphoproteins/p-sites based on the curve-fitting method of the saturation curve from all experiments. CF_3X is the estimation of maximum number of phosphoproteins/p-sites identified in at least three experiments, based on the curve-fitting method (in this case, a reasonable estimate was not possible). CF_best_start is the estimation of maximum number of phosphoproteins/p-sites based on the curve-fitting method of the saturation curve from all experiments, but this time, the largest experiment is used as first in the series. CF_best_end is the estimation of maximum number of phosphoproteins/p-sites based on the curve-fitting method of the saturation curve from all experiments, but this time, the largest experiment is used as last in the series. CF_half_exp is the estimation of maximum number of phosphoproteins/p-sites based on the curve-fitting method of the saturation curve from the first half experiments.
Figure 2:
Figure 2:
Estimation of the number of phosphoproteins (2A, 2B) and p-sites (2C, 2D) for human, with the Curve-Fitting (assuming 1 % noise) and Capture-Recapture methods, also correcting for various levels of noise (1 %, 5 %, 10 %). See legend of Fig. 1 for explanations.
Figure 3:
Figure 3:
Estimation of the number of phosphoproteins (3A, 3B) and p-sites (3C, 3D) for mouse, with the Curve-Fitting (assuming 1 % noise) and Capture-Recapture methods, also correcting for three levels of noise (1 %, 5 %, 10 %). See legend of Fig. 1 for explanations. Estimates on Fig. 3B and D are obtained for a Vega annotated proteome of 16 000 protein-coding genes, where all estimates have been readjusted 25 % upwards.
Figure 4:
Figure 4:
Estimation of the number of phosphoproteins (4A, 4B) and p-sites (4C, 4D) for Arabidopsis, with the Curve-Fitting (assuming 1 % noise) and Capture-Recapture methods, also correcting for 3 levels of noise (1 %, 5 %, 10 %). See legend of Fig. 1 for explanations.

References

    1. Krüger R, Kübler D, Pallissé R et al. . Protein and proteome phosphorylation stoichiometry analysis by element mass spectrometry. Anal. Chem. 2006;78:1987–1994. - PubMed
    1. Nishi H, Shaytan A, Panchenko AR. Physicochemical mechanisms of protein regulation by phosphorylation. Front Genet. 2014;5:270. - PMC - PubMed
    1. Cohen P. The regulation of protein function by multisite phosphorylation–a 25 year update. Trends Biochem. Sci. 2000;25:596–601. - PubMed
    1. Amoutzias GD, He Y, Gordon J et al. . Posttranslational regulation impacts the fate of duplicated genes. Proc. Natl. Acad. Sci. U.S.A. 2010;107:2967–2971. - PMC - PubMed
    1. Amoutzias GD, He Y, Lilley KS et al. . Evaluation and properties of the budding yeast phosphoproteome. Mol. Cell Proteomics. 2012;11:M111.009555. - PMC - PubMed

Publication types