Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function
- PMID: 20525822
- PMCID: PMC2905551
- DOI: 10.1093/bioinformatics/btq295
Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function
Abstract
Motivation: Template-based prediction of DNA binding proteins requires not only structural similarity between target and template structures but also prediction of binding affinity between the target and DNA to ensure binding. Here, we propose to predict protein-DNA binding affinity by introducing a new volume-fraction correction to a statistical energy function based on a distance-scaled, finite, ideal-gas reference (DFIRE) state.
Results: We showed that this energy function together with the structural alignment program TM-align achieves the Matthews correlation coefficient (MCC) of 0.76 with an accuracy of 98%, a precision of 93% and a sensitivity of 64%, for predicting DNA binding proteins in a benchmark of 179 DNA binding proteins and 3797 non-binding proteins. The MCC value is substantially higher than the best MCC value of 0.69 given by previous methods. Application of this method to 2235 structural genomics targets uncovered 37 as DNA binding proteins, 27 (73%) of which are putatively DNA binding and only 1 protein whose annotated functions do not contain DNA binding, while the remaining proteins have unknown function. The method provides a highly accurate and sensitive technique for structure-based prediction of DNA binding proteins.
Availability: The method is implemented as a part of the Structure-based function-Prediction On-line Tools (SPOT) package available at http://sparks.informatics.iupui.edu/spot
Figures



Similar articles
-
Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.PLoS One. 2014 May 2;9(5):e96694. doi: 10.1371/journal.pone.0096694. eCollection 2014. PLoS One. 2014. PMID: 24792350 Free PMC article.
-
Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction.RNA Biol. 2011 Nov-Dec;8(6):988-96. doi: 10.4161/rna.8.6.17813. Epub 2011 Nov 1. RNA Biol. 2011. PMID: 21955494 Free PMC article.
-
Carbohydrate-binding protein identification by coupling structural similarity searching with binding affinity prediction.J Comput Chem. 2014 Nov 15;35(30):2177-83. doi: 10.1002/jcc.23730. Epub 2014 Sep 15. J Comput Chem. 2014. PMID: 25220682
-
Recognition of specific DNA sequences.Mol Cell. 2001 Nov;8(5):937-46. doi: 10.1016/s1097-2765(01)00392-6. Mol Cell. 2001. PMID: 11741530 Review.
-
MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials.BMC Bioinformatics. 2020 Jul 6;21(Suppl 4):246. doi: 10.1186/s12859-020-3383-3. BMC Bioinformatics. 2020. PMID: 32631256 Free PMC article. Review.
Cited by
-
Conformational elasticity can facilitate TALE-DNA recognition.Adv Protein Chem Struct Biol. 2014;94:347-64. doi: 10.1016/B978-0-12-800168-4.00009-3. Adv Protein Chem Struct Biol. 2014. PMID: 24629191 Free PMC article. Review.
-
Improved detection of DNA-binding proteins via compression technology on PSSM information.PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017. PLoS One. 2017. PMID: 28961273 Free PMC article.
-
DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.NAR Genom Bioinform. 2025 May 19;7(2):lqaf058. doi: 10.1093/nargab/lqaf058. eCollection 2025 Jun. NAR Genom Bioinform. 2025. PMID: 40391089 Free PMC article.
-
EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039. Nucleic Acids Res. 2024. PMID: 38281252 Free PMC article.
-
Template-based structure prediction and classification of transcription factors in Arabidopsis thaliana.Protein Sci. 2012 Jun;21(6):828-38. doi: 10.1002/pro.2066. Epub 2012 May 1. Protein Sci. 2012. PMID: 22549903 Free PMC article.
References
-
- Ahmad S, et al. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics. 2004;20:477–486. - PubMed
-
- Burley SK. An overview of structural genomics. Nat. Struct. Biol. 2000;7:932–934. - PubMed
-
- Cai Y.-d, Lin SL. Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim. Biophys. Acta. 2003;1648:127–133. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases