iDNA-Prot: identification of DNA binding proteins using random forest with grey model

Wei-Zhong Lin¹, Jian-An Fang, Xuan Xiao, Kuo-Chen Chou

Affiliations

PMID: 21935457
PMCID: PMC3174210
DOI: 10.1371/journal.pone.0024756

iDNA-Prot: identification of DNA binding proteins using random forest with grey model

Wei-Zhong Lin et al. PLoS One. 2011.

. 2011;6(9):e24756.

doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.

Authors

Wei-Zhong Lin¹, Jian-An Fang, Xuan Xiao, Kuo-Chen Chou

Affiliation

¹ Information Science and Technology School, Donghua University, Shanghai, China.

PMID: 21935457
PMCID: PMC3174210
DOI: 10.1371/journal.pone.0024756

Abstract

DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the "grey model" and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has ≥25% pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

References

1. Langlois RE, Lu H. Boosting the prediction and understanding of DNA-binding domains from sequence. Nucleic Acids Res. 2010;38:3149–3158. - PMC - PubMed
1. Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Research. 2000;25:31–36. - PMC - PubMed
1. Shanahan HP, Garcia MA, Jones S, Thornton JM. Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Research. 2004;32:4732–4741. - PMC - PubMed
1. Ahmad S, Gromiha MM, Sarai A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics. 2004;20:477–486. - PubMed
1. Nordhoff E, Krogsdam AM, Jorgensen HF, Kallipolitis BH, Clark BF, et al. Rapid identification of DNA-binding proteins by mass spectrometry. Nat Biotechnol. 1999;17:884–888. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

iDNA-Prot: identification of DNA binding proteins using random forest with grey model

Affiliation

iDNA-Prot: identification of DNA binding proteins using random forest with grey model

Authors

Affiliation

Abstract

Conflict of interest statement

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials