Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Apr 15;25(8):1012-8.
doi: 10.1093/bioinformatics/btn645. Epub 2008 Dec 16.

Predicting the binding preference of transcription factors to individual DNA k-mers

Affiliations
Comparative Study

Predicting the binding preference of transcription factors to individual DNA k-mers

Trevis M Alleyne et al. Bioinformatics. .

Abstract

Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members.

Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
2D clustergram of Z-scores for 2042 8mers and 75 mouse homeodomains, as observed in either real PBM data (left) or NN predictions (right), with some of the established classes of homeodomains labelled. NN predictions were made using 6AA positions and leave-one-out cross-validation. The 2042 8mers were selected because they comprise the top 100 8mers by Z-score over the DBDs shown.
Fig. 2.
Fig. 2.
Comparison of the accuracy of NN predictions versus experimental replicates. Scatterplots show the measured Z-scores for all 32 896 non-redundant eight-base DNA sequences from one PBM versus a second PBM for the same DBD (top) or versus the Z-score predicted using NN (6AA variant; bottom). Median performance metrics are given. Evx1 has a single NN (Hoxa2); Irx2 has a single NN (Irx3); Lhx1 has two NN (Alx3 and Lhx3).
Fig. 3.
Fig. 3.
Node purity importance scores for 57 homeodomain amino acid positions for 75 rounds of leave-one-out cross-validation, sorted by median value (purple).
Fig. 4.
Fig. 4.
Association between top-100 overlap scores for pairs of 8mer profile inference methods. Scatterplots show the top-100 overlap values for 75 homeodomains when Z-score profiles are predicted using one inference method versus another method for the same proteins. All axes range from 0 to 100. The names on the diagonal label the axes. Predictions are made using the 15 homeodomain DNA-contacting residues. Homeodomains are coloured according to whether they have ≥slant 5 (red), 3–4 (blue) or 1–2 (green) mismatches to their nearest sequence neighbour.

References

    1. Ades SE, Sauer RT. Differential DNA-binding specificity of the engrailed homeodomain: the role of residue 50. Biochemistry. 1994;33:9187–9194. - PubMed
    1. Banerjee-Basu S, Baxevanis AD. Molecular evolution of the homeodomain family of transcription factors. Nucleic Acids Res. 2001;29:3258–3269. - PMC - PubMed
    1. Bateman A, et al. The Pfam protein families database. Nucleic Acids Res. 2004;32:D138–D141. - PMC - PubMed
    1. Benos PV, et al. Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res. 2002;30:4442–4451. - PMC - PubMed
    1. Berger MF, et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell. 2008;133:1266–1276. - PMC - PubMed

Publication types

MeSH terms