Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Nov 22;26(1):bbaf016.
doi: 10.1093/bib/bbaf016.

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences

Affiliations
Review

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences

Sushmita Basu et al. Brief Bioinform. .

Abstract

Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.

Keywords: DNA-binding residue; RNA-binding residue; deep learning; intrinsic disorder; machine learning; nucleic acid-binding; protein–DNA interaction; protein–RNA interaction; sequence-based prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Timeline of the release of the 87 sequence-based nucleic acid binding residue predictors. The color-coded bars represent methods that target prediction of DBRs (blue), RBRs (orange), and both DBRs and RBRs (green). The major milestones are shown at the bottom in the blue-bordered boxes.
Figure 2
Figure 2
Relation between the publication year and predictive performance for the corresponding methods that was measured on the same benchmark dataset of 46 DNA-binding proteins that was introduced and applied in refs. [58–60, 74, 75] (top panel) and the same benchmark dataset of the 161 RNA-binding proteins that was used in refs. [60, 62] (bottom panel). Hollow markers denote methods that predict DBRs (top panel) or RBRs (bottom panel) while solid markers are for predictors of DRBs and RBRs (both panels). The primary/left y-axis quantifies the AUC values (blue markers) and the secondary/right y-axis gives the MCC values (green markers). The color-coded dashed lines are the moving averages of the corresponding metrics calculated over three consecutive methods based on the publication years. The numerical values of AUCs and MCCs are given in the Supplementary Tables S2 (results for the top panel) and S3 (results for the bottom panel).

Similar articles

References

    1. Djebali S, Davis CA, Merkel A. et al. . Landscape of transcription in human cells. Nature 2012;489:101–8. 10.1038/nature11233. - DOI - PMC - PubMed
    1. Evande R, Rana A, Biswas-Fiss EE. et al. . Protein–DNA interactions regulate human papillomavirus DNA replication, transcription, and oncogenesis. Int J Mol Sci 2023;24. 10.3390/ijms24108493. - DOI - PMC - PubMed
    1. Cozzolino F, Iacobucci I, Monaco V. et al. . Protein–DNA/RNA interactions: an overview of investigation methods in the omics era. J Proteome Res 2021;20:3018–30. 10.1021/acs.jproteome.1c00074. - DOI - PMC - PubMed
    1. Oyejobi GK, Yan X, Sliz P. et al. . Regulating protein–RNA interactions: advances in targeting the LIN28/Let-7 pathway. Int J Mol Sci 2024;25. 10.3390/ijms25073585. - DOI - PMC - PubMed
    1. Peng Z, Oldfield CJ, Xue B. et al. . A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome. Cell Mol Life Sci 2014;71:1477–504. 10.1007/s00018-013-1446-6. - DOI - PMC - PubMed