Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Jun;38(10):3149-58.
doi: 10.1093/nar/gkq061. Epub 2010 Feb 15.

Boosting the prediction and understanding of DNA-binding domains from sequence

Affiliations
Comparative Study

Boosting the prediction and understanding of DNA-binding domains from sequence

Robert E Langlois et al. Nucleic Acids Res. 2010 Jun.

Abstract

DNA-binding proteins perform vital functions related to transcription, repair and replication. We have developed a new sequence-based machine learning protocol to identify DNA-binding proteins. We compare our method with an extensive benchmark of previously published structure-based machine learning methods as well as a standard sequence alignment technique, BLAST. Furthermore, we elucidate important feature interactions found in a learned model and analyze how specific rules capture general mechanisms that extend across DNA-binding motifs. This analysis is carried out using the malibu machine learning workbench available at http://proteomics.bioengr.uic.edu/malibu and the corresponding data sets and features are available at http://proteomics.bioengr.uic.edu/dna.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Illustration of the calculation of local environment amino acid composition.
Figure 2.
Figure 2.
A ROC comparison of the new sequence-based feature representation and Boosted Trees with the JMB06 structure-based protocol and BLAST.
Figure 3.
Figure 3.
An ADTree built over the JMB06 data set. The square nodes in the model hold the name of the feature and order it was learned. The round nodes hold the weighted vote where a positive number predicts DNA binding. Below the square node is the threshold of prediction, if this number is exceeded then the right path is taken, otherwise the left. Below each path in the tree, there is a set of numbers in the format counted/total for the prefixing DNA-binding subgroup.
Figure 4.
Figure 4.
Several example protein structures bound to DNA. (a) 3PVI illustrating turn leucine residues in contact with DNA. (b) 3PVI illustrating turn histidine residues in contact with DNA. (c) 1ECR illustrating sheet arginine residues in contact with DNA.

References

    1. Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang J-PZ, Widom J. A genomic code for nucleosome positioning. Nature. 2006;442:772–778. - PMC - PubMed
    1. Cajone F, Salina M, Benelli-Zazzera A. 4-hydroxynonenal induces a DNA-binding protein similar to the heat-shock factor. Biochem. J. 1989;262:977–979. - PMC - PubMed
    1. Buck MJ, Lieb JD. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics. 2004;83:349–360. - PubMed
    1. Freeman K, Gwadz M, Shore D. Molecular and genetic analysis of the toxic effect of RAP1 overexpression in yeast. Genetics. 1995;141:1253–1262. - PMC - PubMed
    1. Chou C-C, Lin T-W, Chen C-Y, Wang A.HJ. Crystal structure of the hyperthermophilic archaeal DNA-binding protein Sso10b2 at a resolution of 1.85 angstroms. J. Bacteriol. 2003;185:4066–4073. - PMC - PubMed

Publication types

MeSH terms

Substances