Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins

Pawel Pratyush¹, Suresh Pokharel¹, Stefan Schulze², Lisa Bramer³, Robert H Newman⁴, Dukka B Kc⁵

Affiliations

¹ Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA.
² Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY, USA.
³ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA.
⁴ College of Science and Technology, North Carolina Agricultural and Technical State University, Greensboro, NC, USA.
⁵ Department of Computer Science, Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA. dkcvcs@rit.edu.

PMID: 40601266
DOI: 10.1007/978-1-0716-4623-6_19

Review

Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins

Pawel Pratyush et al. Methods Mol Biol. 2025.

. 2025:2941:313-355.

doi: 10.1007/978-1-0716-4623-6_19.

Authors

Pawel Pratyush¹, Suresh Pokharel¹, Stefan Schulze², Lisa Bramer³, Robert H Newman⁴, Dukka B Kc⁵

Affiliations

¹ Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA.
² Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY, USA.
³ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA.
⁴ College of Science and Technology, North Carolina Agricultural and Technical State University, Greensboro, NC, USA.
⁵ Department of Computer Science, Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA. dkcvcs@rit.edu.

PMID: 40601266
DOI: 10.1007/978-1-0716-4623-6_19

Abstract

Post-translational modifications (PTMs) are vital regulators of protein function, influencing a myriad of cellular processes and disease mechanisms. Traditional experimental methods for PTM identification are both costly and labor-intensive, underlining the pressing need for efficient computational approaches. Early computational strategies predominantly relied on primary amino acid sequences and handcrafted features, which often lacked the contextual and structural understanding necessary for precise PTM site prediction. The emergence of transformer-based large language models (LLMs), particularly protein language models (pLMs), has revolutionized PTM prediction by producing context-aware embeddings that capture functional and structural intra-sequence dependencies. In this chapter, we provide a comprehensive review of recent advancements in leveraging LLMs (or, pLMs) for PTM site prediction, an important residue-level task in protein research. We identify emerging trends in the field, including the application of fine-tuning techniques, the integration of embeddings from multiple pLMs, and the incorporation of multiple modalities such as codon-aware embeddings, 3D structural data, and conventional representations. Additionally, we discuss tools that employ graph-based representations, the mamba architecture, and contrastive learning paradigms to further refine pLM-powered PTM site prediction models. We finally explore the interpretability and explainability aspects of the embeddings used in various tools. Despite the significant progress made, persistent limitations remain, and we outline these challenges while proposing directions for future research.

Keywords: AlphaFold; Contrastive learning; Explainability; Fine-tuning; GPT; Graph; Large language model; Mamba; Post-translational modification; Protein language model.

PubMed Disclaimer

References

1. Keenan EK, Zachman DK, Hirschey MD (2021) Discovering the landscape of protein modifications. Mol Cell 81:1868–1878. https://doi.org/10.1016/j.molcel.2021.03.015 - DOI - PubMed - PMC
1. Aebersold R, Agar JN, Amster IJ et al (2018) How many human proteoforms are there? Nat Chem Biol 14:206–214. https://doi.org/10.1038/nchembio.2576 - DOI - PubMed - PMC
1. Hong X, Li N, Lv J et al (2023) PTMint database of experimentally verified PTM regulation on protein–protein interaction. Bioinformatics 39:btac823. https://doi.org/10.1093/bioinformatics/btac823 - DOI - PubMed
1. Lee JM, Hammarén HM, Savitski MM, Baek SH (2023) Control of protein stability by post-translational modifications. Nat Commun 14:201. https://doi.org/10.1038/s41467-023-35795-8 - DOI - PubMed - PMC
1. Ryšlavá H, Doubnerová V, Kavan D, Vaněk O (2013) Effect of posttranslational modifications on enzyme function and assembly. J Proteome 92:80–109. https://doi.org/10.1016/j.jprot.2013.03.025 - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Springer
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins

Affiliations

Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins

Authors

Affiliations

Abstract

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Miscellaneous