Advances in Language-Model-Informed Protein-Nucleic Acid Binding Site Prediction
- PMID: 40601256
- DOI: 10.1007/978-1-0716-4623-6_9
Advances in Language-Model-Informed Protein-Nucleic Acid Binding Site Prediction
Abstract
Interactions between proteins and nucleic acids are essential for understanding a wide range of cellular and evolutionary processes. Recent advancements in protein language models (pLMs), trained on vast protein sequence data, have revolutionized various predictive modeling tasks, offering unprecedented scalability and generalizability. Consequently, a number of computational methods have been developed in the recent past for protein-nucleic acid binding site prediction powered by pLMs. To this end, we recently developed the EquiPNAS method that integrates pLM embeddings with E(3) equivariant deep graph neural networks for enhancing accuracy and robustness in predicting protein-DNA and protein-RNA binding sites, thereby reducing the dependency on evolutionary information. Here we present an overview of the recent protein-nucleic acid binding site prediction methods, emphasizing the recent advances in harnessing the potential of pLMs, and provide a detailed description of the EquiPNAS methodology as well as the necessary materials and procedures for the computational prediction of protein-DNA and protein-RNA binding sites.
Keywords: Graph neural networks; Language models; Protein–DNA binding site prediction; Protein–RNA binding site prediction.
© 2025. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.
Similar articles
-
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.Brief Bioinform. 2024 Nov 22;26(1):bbaf016. doi: 10.1093/bib/bbaf016. Brief Bioinform. 2024. PMID: 39833102 Free PMC article. Review.
-
Hybrid protein-ligand binding residue prediction with protein language models: does the structure matter?Bioinformatics. 2025 Aug 2;41(8):btaf431. doi: 10.1093/bioinformatics/btaf431. Bioinformatics. 2025. PMID: 40742755 Free PMC article.
-
Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins.Methods Mol Biol. 2025;2941:313-355. doi: 10.1007/978-1-0716-4623-6_19. Methods Mol Biol. 2025. PMID: 40601266 Review.
-
EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039. Nucleic Acids Res. 2024. PMID: 38281252 Free PMC article.
-
A Survey of Deep Learning Methods and Tools for Protein Binding Site Prediction.Methods Mol Biol. 2025;2947:89-108. doi: 10.1007/978-1-0716-4662-5_5. Methods Mol Biol. 2025. PMID: 40728609 Review.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous