Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 2;9(5):e96694.
doi: 10.1371/journal.pone.0096694. eCollection 2014.

Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome

Affiliations

Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome

Huiying Zhao et al. PLoS One. .

Abstract

As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Performance of various methods for DNA-binding protein prediction (leave-one-out cross validation).
Figure 2
Figure 2. Matthews correlation coefficient for predicted binding residues versus the structural similarity SP-score between predicted and known structures of 116 targets.
The correlation coefficient is 0.38.
Figure 3
Figure 3. Comparison of predicted (red) and native structures (green) of target 1yfjD (DAM).
Native structure and DNA are represented by green and orange, respectively. The predicted structure and DNA are denoted by color red and grey. The predicted binding sites and native binding sites are in cyan and yellow colors, respectively.

Similar articles

Cited by

References

    1. Apweiler R, Martin MJ, O'Donovan C, Magrane M, Alam-Faruque Y, et al. (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research 38: D142–D148. - PMC - PubMed
    1. Engelhardt BE, Jordan MI, Srouji JR, Brenner SE (2011) Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res 21: 1969–1980. - PMC - PubMed
    1. Stawiski EW, Gregoret LM, Mandel-Gutfreund Y (2003) Annotating nucleic acid-binding function based on protein structure. Journal of Molecular Biology 326: 1065–1079. - PubMed
    1. Cai YD, Lin SL (2003) Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochimica Et Biophysica Acta-Proteins and Proteomics 1648: 127–133. - PubMed
    1. Jones S, Barker JA, Nobeli I, Thornton JM (2003) Using structural motif templates to identify proteins with DNA binding function. Nucleic Acids Research 31: 2811–2823. - PMC - PubMed

Publication types