Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 1;40(11):btae602.
doi: 10.1093/bioinformatics/btae602.

Sitetack: a deep learning model that improves PTM prediction by using known PTMs

Affiliations

Sitetack: a deep learning model that improves PTM prediction by using known PTMs

Clair S Gutierrez et al. Bioinformatics. .

Abstract

Motivation: Post-translational modifications (PTMs) increase the diversity of the proteome and are vital to organismal life and therapeutic strategies. Deep learning has been used to predict PTM locations. Still, limitations in datasets and their analyses compromise success.

Results: We evaluated the use of known PTM sites in prediction via sequence-based deep learning algorithms. For each PTM, known locations of that PTM were encoded as a separate amino acid before sequences were encoded via word embedding and passed into a convolutional neural network that predicts the probability of that PTM at a given site. Without labeling known PTMs, our models are on par with others. With labeling, however, we improved significantly upon extant models. Moreover, knowing PTM locations can increase the predictability of a different PTM. Our findings highlight the importance of PTMs for the installation of additional PTMs. We anticipate that including known PTM locations will enhance the performance of other proteomic machine learning algorithms.

Availability and implementation: Sitetack is available as a web tool at https://sitetack.net; the source code, representative datasets, instructions for local use, and select models are available at https://github.com/clair-gutierrez/sitetack.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Overall framework of the Sitetack model for PTM prediction.
Figure 2.
Figure 2.
Prediction of three representative PTMs with and without the use of labels: serine or threonine phosphorylation, N-linked glycosylation, and proline hydroxylation. (A) AUC curves. (B) AUPRC curves.
Figure 3.
Figure 3.
Frequency of nearby PTM sites for three representative PTMs: serine or threonine phosphorylation, asparagine N-glycosylation, and proline hydroxylation. (A) All-organism datasets. (B) Human-only datasets. (C) Interpretability of the human serine or threonine phosphorylation dataset using integrated gradients. Integrated gradients were used to cluster k-mers; for select clusters (1, 2, and 6), the PSIG was superimposed over the amino acid frequency at each position.
Figure 4.
Figure 4.
Phosphorylation (S,T) prediction using kinase-specific models. (A) Difference in AUC with and without known site locations. (B) Frequencies of nearby sites in the dataset. (C) AUC and AUPRC curves.
Figure 5.
Figure 5.
O-GlcNAc prediction using known O-GlcNAc or phosphorylation sites. (A) Performance assessments with and without labels. (B) Frequencies of nearby sites in the dataset. (C) AUC and AUPRC curves.

Update of

Similar articles

Cited by

References

    1. Barbour H, Nkwe NS, Estavoyer B. et al. An inventory of crosstalk between ubiquitination and other post-translational modifications in orchestrating cellular processes. iScience 2023;26:106276. - PMC - PubMed
    1. Bateman A, Martin MJ, Orchard S. et al. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res 2023;51:D523–31. - PMC - PubMed
    1. Blazev R, , CarlCS, , Ng Y-K. et al. Phosphoproteomics of three exercise modalities identifies canonical signaling and C18ORF25 AS AN AMPK substrate regulating skeletal muscle function. Cell Metab 2022;34:1561–77.e9. - PubMed
    1. Chang CC, Tung CH, Chen CW. et al. SUMOgo: prediction of sumoylation sites on lysines by motif screening models and the effects of various post-translational modifications. Sci Rep 2018;8:15512–0. - PMC - PubMed
    1. Crooks GE, Hon G, Chandonia JM. et al. WebLogo: a sequence logo generator. Genome Res 2004;14:1188–90. - PMC - PubMed