Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025:2941:139-151.
doi: 10.1007/978-1-0716-4623-6_9.

Advances in Language-Model-Informed Protein-Nucleic Acid Binding Site Prediction

Affiliations

Advances in Language-Model-Informed Protein-Nucleic Acid Binding Site Prediction

Sumit Tarafder et al. Methods Mol Biol. 2025.

Abstract

Interactions between proteins and nucleic acids are essential for understanding a wide range of cellular and evolutionary processes. Recent advancements in protein language models (pLMs), trained on vast protein sequence data, have revolutionized various predictive modeling tasks, offering unprecedented scalability and generalizability. Consequently, a number of computational methods have been developed in the recent past for protein-nucleic acid binding site prediction powered by pLMs. To this end, we recently developed the EquiPNAS method that integrates pLM embeddings with E(3) equivariant deep graph neural networks for enhancing accuracy and robustness in predicting protein-DNA and protein-RNA binding sites, thereby reducing the dependency on evolutionary information. Here we present an overview of the recent protein-nucleic acid binding site prediction methods, emphasizing the recent advances in harnessing the potential of pLMs, and provide a detailed description of the EquiPNAS methodology as well as the necessary materials and procedures for the computational prediction of protein-DNA and protein-RNA binding sites.

Keywords: Graph neural networks; Language models; Protein–DNA binding site prediction; Protein–RNA binding site prediction.

PubMed Disclaimer

Similar articles

References

    1. Ferraz RAC, Lopes ALG, Da Silva JAF, Moreira DFV, Ferreira MJN, De Almeida Coimbra SV (2021) DNA–protein interaction studies: a historical and comparative analysis. Plant Methods 17:82 - PubMed - PMC - DOI
    1. Ofran Y, Mysore V, Rost B (2007) Prediction of DNA-binding residues from sequence. Bioinformatics 23:i347–i353 - PubMed - DOI
    1. Yesudhas D, Batool M, Anwar M, Panneerselvam S, Choi S (2017) Proteins recognizing DNA: structural uniqueness and versatility of DNA-binding domains in stem cell transcription factors. Genes 8:192 - PubMed - PMC - DOI
    1. Zheng M, Sun G, Li X, Fan Y (2024) EGPDI: identifying protein–DNA binding sites based on multi-view graph embedding fusion. Brief Bioinform 25:bbae330 - PubMed - PMC - DOI
    1. Zhou J, Xu R, He Y, Lu Q, Wang H, Kong B (2016) PDNAsite: identification of DNA-binding site from protein sequence by incorporating spatial and sequence context. Sci Rep 6:27653 - PubMed - PMC - DOI

LinkOut - more resources