Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan 1;25(1):22-9.
doi: 10.1093/bioinformatics/btn580. Epub 2008 Nov 13.

Predicting DNA recognition by Cys2His2 zinc finger proteins

Affiliations

Predicting DNA recognition by Cys2His2 zinc finger proteins

Anton V Persikov et al. Bioinformatics. .

Abstract

Motivation: Cys(2)His(2) zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved protein-DNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The 'canonical model' for ZF protein-DNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain.

Results: We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF protein-DNA interactions, ours additionally incorporates information about protein-DNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of protein-DNA pairs; this type of information has not been used previously in predicting ZF protein-DNA binding. Here, we build a high-quality literature-derived experimental database of ZF-DNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF protein-DNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF protein-DNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of protein-DNA pairs have great potential for effective prediction of protein-DNA interactions.

Availability: An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic representation of the canonical binding model. Amino acids are numbered according to their sequential number with amino acid 1 as the first residue in the ZF helical domain. Bases are numbered sequentially from 5 to 3 of the primary DNA chain, and are primed in the complementary DNA chain.
Fig. 2.
Fig. 2.
ROC curves for cross-validation analysis. (A) Testing on new data and (B) holdout cross-validation. Key: linear SVM (red), polynomial SVM (black), BLS02 (green), MGM98 (magenta), SBGY95 (cyan) and KFM05 (blue).

Similar articles

Cited by

References

    1. Benos PV, et al. SAMIE: statistical algorithm for modeling interaction energies. Pac. Symp. Biocomput. 2001;6:115–126. - PubMed
    1. Benos PV, et al. Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol. 2002;323:701–727. - PubMed
    1. Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Blancafort P, et al. Scanning the human genome with combinatorial transcription factor libraries. Nat. Biotechnol. 2003;21:269–274. - PubMed
    1. Bulyk ML, et al. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA. 2001;98:7158–7163. - PMC - PubMed

Publication types