Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov;44(22):6704-6731.
doi: 10.1038/s44318-025-00567-1. Epub 2025 Oct 3.

Mapping cryptic phosphorylation sites in the human proteome

Affiliations

Mapping cryptic phosphorylation sites in the human proteome

Dino Gasparotto et al. EMBO J. 2025 Nov.

Abstract

Advances in computational and experimental methods have revealed the existence of transient, non-native protein folding intermediates that could play roles in disparate biological processes, from regulation of protein expression to disease-relevant misfolding mechanisms. Here, we tested the possibility that specific post-translational modifications may involve residues exposed during the folding process by assessing the solvent accessibility of 87,138 post-translationally modified amino acids in the human proteome. Unexpectedly, we found that one-third of phosphorylated proteins present at least one phosphosite completely buried within the protein's inner core. Computational and experimental analyses suggest that these cryptic phosphosites may become exposed during the folding process, where their modification could destabilize native structures and trigger protein degradation. Phylogenetic investigation also reveals that cryptic phosphosites are more conserved than surface-exposed phosphorylated residues. Finally, cross-referencing with cancer mutation databases suggests that phosphomimetic mutations in cryptic phosphosites can increase tumor fitness by inactivating specific onco-suppressors. These findings define a novel role for co-translational phosphorylation in shaping protein folding and expression, laying the groundwork for exploring the implications of cryptic phosphorylation in health and disease.

Keywords: Co-translational Phosphorylation; Cryptic Phosphosites; Post-translation Modification; Protein Folding; Protein Phosphorylation.

PubMed Disclaimer

Conflict of interest statement

Disclosure and competing interests statement. The authors declare the following competing interests: GS, GL, PF, and EB are co-founders and shareholders of Sibylla Biotech SRL ( www.sibyllabiotech.it ). The company exploits the information arising from folding pathway reconstruction for drug discovery in a wide variety of human pathologies.

Figures

Figure 1
Figure 1. Schematic illustration of the workflow adopted to identify cryptic PTMs.
Data from PhosphoSitePlus, which holds an extensive collection of experimentally validated PTMs, were cross-referenced with protein structures from DeepMind’s AlphaFold database. This approach matched each PTM entry from PhosphoSitePlus to its corresponding protein structure in AlphaFold. The analysis focused on PTMs within structured regions, specifically those with an AlphaFold confidence score (pLDDT) exceeding 65. RSA values of PTM-modified residues were calculated to assess their solvent exposure in the native state. Residues with RSA values below 0.15 were categorized as cryptic, indicating minimal solvent exposure.
Figure 2
Figure 2. Graphic distribution of the occurrence of different PTMs in relation to the RSA of the corresponding target amino acid.
The figure illustrates the relative distribution of distinct PTMs across a range of RSA values, representing solvent accessibility of the target residues in their native protein structure. Each PTM category is plotted against its corresponding RSA value, highlighting patterns of distribution and solvent exposure. The analysis reveals differences in solvent accessibility between PTM types, with cryptic modifications clustering at low (<0.15) RSA values, indicative of buried residues. The distribution provides insights into the structural environments favored by different PTMs, underlining the unique behavior of phosphosites compared to other modifications.
Figure 3
Figure 3. Schematic illustration of the workflow adopted for the dynamic and structural filtering of post-translationally exposed sites.
(A) Low-confidence segments, such as loops, linkers, or disordered regions with AlphaFold scores below 65, were removed from the starting protein dataset. The refined dataset included either single protein domains or groups of contiguous domains treated as single entries if at least five residues were within 5 Å of one another. Cryptic phosphosites located in domains with fewer than 40 residues were excluded. (B) The RSA of cryptic phosphosites was recalculated for the updated entries, excluding residues that presented an RSA above 0.15. (C) SPECTRUS identified quasi-rigid domains by creating β-Gaussian models for each entry. Cryptic phosphosites located in quasi-rigid domains of the highest SPECTRUS quality scoring group were further filtered, excluding those whose side chains make fewer than 80% of intra-domain contacts. (D, E) Example of two proteins containing true and false cryptic phosphosites. The structures of RAB1B ((D), UniProt ID: Q9H0U4) and CDK1 ((E), UniProt ID: P06493) are depicted in new cartoon style. The color of the image does not reflect any structural property but instead it is used to distinguish different quasi-rigid domains. In particular, black regions identify unstructured domains, whereas shadows from dark gray to white identify quasi-rigid domains. Boxes highlight the individual phosphosites and the related amino acid position, cryptic or non-cryptic classification, post-processing RSA value, and proportion of intra-domain contacts (Pidc).
Figure 4
Figure 4. Experimental characterization of the effect of phosphoablative and phosphomimetic mutations in SMAD2, CHK1, and Pyst1.
(A–C) Illustration of the three-dimensional structure of SMAD2 (A), the KA1 domain of CHK1 (B), and Pyst1 (C). The side chain of each individual cryptic phosphosite is shown in a different color. The graphs below each structure report the graphical plot of RSA values calculated for different amino acidic variants at the related position (from left to right): for SMAD2, unphosphorylated S417, phosphorylated S417 with charge −1, phosphorylated S417 with charge −2; for CHK1, unphosphorylated T382, phosphorylated T382 with charge −1, phosphorylated T382 with charge −2; for Pyst1, unphosphorylated S300, phosphorylated S300 with charge −1, phosphorylated S300 with charge −2. RSA values were calculated along a 1 µs MD simulation. (D, E, F) Dot plots reporting the relative expression levels of clones for each individual construct: for SMAD2, S417 (WT), S417A (phosphoablative mutant), and S417E (phosphomimetic mutant); for CHK1, T382 (WT), T382A (phosphoablative mutant), and T382D (phosphomimetic mutant); for PYST1, S300 (WT), S300A (phosphoablative mutant), and S300E (phosphomimetic mutant). All the protein variants were tagged with the FLAG epitope to facilitate quantification. The expression of each clone was assessed by western blotting using an anti-FLAG antibody. For SMAD2 and PYST1, protein levels in the different clones were expressed as the ratio of corresponding protein lanes in the induced samples (I) and a reference control (R) to allow cross-comparison among different gels. For CHK1, we observed leakage in the non-induced controls of expressing clones. Therefore, each signal was expressed as the ratio between induced (I) and non-induced (NI) signals. Samples showing a negative ratio were set to 1. Statistical differences were evaluated by a multiple Mann–Whitney U test: for SMAD2, WT vs S417A, P = 0.11626, WT vs S417E, P = 0.00001 (****), S417A vs S417E, P = 0.00001 (****); for CHK1, WT vs T382A, P = 0.14486, WT vs T382D, P = 0.08444, T382A vs T382D, P = 0.00555 (*); for PYST1, WT vs S300A, P = 0.58915, WT vs S300E, P = 0.16544, S300A vs S300E, P = 0.48166.
Figure 5
Figure 5. Experimental and computational investigation of cryptic phosphosite T382 in CHK1.
(A–C) CHK1 stability was analyzed in three selected clones expressing CHK1 variants at codon T382: WT T382 (A), phosphoablative mutant T382A (B), and phosphomimetic mutant T382D (C). After doxycycline induction (72 h, 1 μg/mL), protein synthesis was stopped with cycloheximide (CHX), and CHK1 levels were monitored at 0, 2, 4, 8, and 16 h using western blotting with anti-FLAG antibody; β-actin signal served as loading control. For each time point, ≥3 biological replicates were produced (except for the T382D clone at 16 h, where only one replicate was obtained). Panels (i) display representative western blots for each CHK1 form, detected with an anti-FLAG antibody. Panel (ii) illustrates the quantified data (mean ± standard deviation), normalized to the baseline level (t0). The degradation kinetics were modeled using a one-phase decay curve from which half-time (t1/2) values were derived. Data analysis and curve fitting were performed using the GraphPad Prism software. (D) Atomistic reconstruction of the CHK1 KA1 domain folding pathway. Lower-bound approximation of the transition path energy is related to the folding of KA1. The energy is plotted as the negative logarithm of the probability distribution (-ln(p)), which is expressed as a function of the collective variables Q and RMSD (65 × 65 bin matrix). A Gaussian blur was applied. The highly populated native state appears as expected in the bottom-right corner (high Q and low RMSD, black rectangle). The indexed squares define the most-populated partially unfolded regions of interest (-ln(p)). Well thresholds from most to least stable: (A) ≤2.5 k. bT, (B) ≤3.5 kbT, (C) ≤3.5 kbT, (D) ≤3 kbT). (E) Distribution of RSA of amino acid T382 along the transition path. The highest values are associated with the unfolded state (top left), while in the native state, the residue is consistently below 0.15. (F) Representative conformations for each cluster. All clusters are explored at least by 2/9 LB conformations. Residue T382 is displayed in red, α-helices are colored in purple, β-sheets in orange.
Figure 6
Figure 6. Graphic distribution of evolutionary conservation of cryptic and non-cryptic phosphosites.
The KDE plot reports the distribution of ES for cryptic (colored in light purple) and non-cryptic (colored in dark purple) phosphosites.
Figure EV1
Figure EV1
Graphic distribution of the occurrence of phosphorylation in relation to the RSA of the corresponding target amino acids (serine, tyrosine and threonine).
Figure EV2
Figure EV2
Graphical plot of the RSA of cryptic phosphosites vs the length of the proteins in which they occur.
Figure EV3
Figure EV3
Pie chart illustration of the number of cryptic phosphosites per protein.
Figure EV4
Figure EV4
Graphic distribution of the occurrence of phosphorylation in relation to the RSA of the corresponding amino acids after applying the dynamic and structural filtering.
Figure EV5
Figure EV5. Representative western blots of the effect of phosphoablative and phosphomimetic mutations in SMAD2.
Individual clones (coded with the indicated numbering) stably transfected with WT (A), phosphoablative (B) and phosphomimetic (C) SMAD2 constructs were non-induced (−) or induced with doxycycline for 48 h. A reference clone (R) induced for 48 h was included for comparison among different western blots. Protein expression was detected with an anti-FLAG antibody. Signals were normalized on the corresponding total protein lanes.
Figure EV6
Figure EV6. Representative western blots of the effect of phosphoablative and phosphomimetic mutations in CHK1.
Individual clones (coded with the indicated numbering) stably transfected with WT (A), phosphoablative (B) and phosphomimetic (C) CHK1 constructs were non-induced (−) or induced with doxycycline for 48 h. Protein expression was detected with an anti-FLAG antibody. Signals were normalized on the corresponding total protein lanes.
Figure EV7
Figure EV7. Representative western blots of the effect of phosphoablative and phosphomimetic mutations in Pyst1.
Individual clones (coded with the indicated numbering) stably transfected with WT (A), phosphoablative (B) and phosphomimetic (C) Pyst1 constructs were non-induced (−) or induced with doxycycline for 24 h. A reference clone (R) induced for 24 h was included for comparison among different western blots. Protein expression was detected with an anti-FLAG antibody. Signals were normalized on the corresponding total protein lanes.
Figure EV8
Figure EV8. Experimental and computational investigation of cryptic phosphosites in CHK1 and SMAD2.
(A) The analysis of CHK1 stability presented in Fig. 5A–C are plotted together to better represent the statistical differences between the phosphomimetic mutant D382 and the WT T382 and phosphoablative mutant A382. The graph illustrates the quantified data (mean ± standard deviation), normalized to the t0 level. The degradation kinetics were modeled using a one-phase decay curve from which half-time (t1/2) values were derived. Data analysis and curve fitting were performed using GraphPad Prism software. Statistical differences vs the WT T382 were obtained by ordinary one-way ANOVA test; *P < 0.05; **P < 0.01; ***P < 0.001. (B) The plot shows the atomistic reconstruction of the SMAD2 folding pathway. Lower-bound approximation of the transition path energy is related to the folding of SMAD2. The energy is plotted as the negative logarithm of the probability distribution (-ln(p)), which is expressed as a function of the collective variables Q and RMSD (65×65 bin matrix). Gaussian blur was applied. The highly populated native state appears as expected in the bottom-right corner (high Q and low RMSD, black rectangle). The indexed squares define the most-populated partially unfolded regions of interest (-ln(p)). (C) The plot shows the distribution of RSA of the S417 residue along the transition path. The highest values are associated with the unfolded state (top left), while in the native state the RSA is consistently below 0.15. (D) Representative conformations for each cluster. S417 is displayed in red, α-helices are colored in orange, β-sheets in light purple.
Figure EV9
Figure EV9. Graphic distribution of evolutionary conservation of cryptic and non-cryptic phosphosites.
(A-B) KDE plot reports the distribution of ES for cryptic (light purple) and non-cryptic (dark purple) phosphosites. The evolutionary conservation of cryptic and non-cryptic phosphosites were superimposed to the distribution of cryptic and non-cryptic amino acids.

References

    1. A Beccara S, Škrbić T, Covino R, Faccioli P (2012) Dominant folding pathways of a WW domain. Proc Natl Acad Sci USA 109:2330–2335 - PMC - PubMed
    1. Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230 - PubMed
    1. Bachman AB, Keramisanou D, Xu W, Beebe K, Moses MA, Vasantha Kumar MV, Gray G, Noor RE, van der Vaart A, Neckers L et al (2018) Phosphorylation induced cochaperone unfolding promotes kinase recruitment and client class-specific Hsp90 phosphorylation. Nat Commun 9:265 - PMC - PubMed
    1. Bartolucci G, Orioli S, Faccioli P (2018) Transition path theory from biased simulations. J Chem Phys 149:072336 - PubMed
    1. Bhatia S, Udgaonkar JB (2024) Understanding the heterogeneity intrinsic to protein folding. Curr Opin Struct Biol 84:102738 - PubMed

LinkOut - more resources