Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Sep 10;506(1):125-34.
doi: 10.1016/j.gene.2012.06.005. Epub 2012 Jun 10.

Differences in local genomic context of bound and unbound motifs

Affiliations
Comparative Study

Differences in local genomic context of bound and unbound motifs

Loren Hansen et al. Gene. .

Abstract

Understanding gene regulation is a major objective in molecular biology research. Frequently, transcription is driven by transcription factors (TFs) that bind to specific DNA sequences. These motifs are usually short and degenerate, rendering the likelihood of multiple copies occurring throughout the genome due to random chance as high. Despite this, TFs only bind to a small subset of sites, thus prompting our investigation into the differences between motifs that are bound by TFs and those that remain unbound. Here we constructed vectors representing various chromatin- and sequence-based features for a published set of bound and unbound motifs representing nine TFs in the budding yeast Saccharomyces cerevisiae. Using a machine learning approach, we identified a set of features that can be used to discriminate between bound and unbound motifs. We also discovered that some TFs bind most or all of their strong motifs in intergenic regions. Our data demonstrate that local sequence context can be strikingly different around motifs that are bound compared to motifs that are unbound. We concluded that there are multiple combinations of genomic features that characterize bound or unbound motifs.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Correlation between motif strength and p-value of binding
(a) Plotted is the mean p-value of binding for intergenic regions whose average motif strength was >80% of the maximum possible log-likelihood score. The p-value of binding was obtained from (Harbison et al., 2004). The number above each bar is the information content for the given motif in bits. The smaller the information content, the more likely that motif is to occur by random chance in a sequence. (b) For every motif, the average p-value of binding in intergenic regions containing high scoring motifs was calculated as described above (y-axis). The x-axis is the information content of the motifs in bits. (c and d) Plots of the p-value of binding versus motif strength for (c) ABF1 and (d) SUT1. The x-axis denotes the motif strength of a given TF as a percentage of the maximum possible PWM log-likelihood score. Higher motif strength correlates with closer proximity to the consensus sequence. The average p-value of binding for the collected intergenic regions that met the given motif strength threshold was calculated (y-axis). ABF1 and SUT1 were plotted because they represent the two extremes.
Figure 2
Figure 2. TA dinucleotide content around bound or unbound motifs
Motifs classified as bound or unbound were aligned. The TA dinucleotide content was binned in 50-bp windows moving upstream and downstream from the motif. Zero on the x-axis represents the center of the aligned motif. Black: The average percentage of TA, which is defined as the fraction of dinucleotides that are TA within each 50 bp window. Green: The background TA content calculated by randomly selecting locations in intergenic regions and repeating the procedure as described.
Figure 3
Figure 3. Motifs enriched near bound or unbound motifs
The fraction of bound (red) or unbound (blue) motifs that exhibit at least one of the labeled motifs within 100 bp is plotted for the nine TFs shown. p-values were calculated using the z-test for two proportions, and corrected for multiple testing using Benjamini, Hochberg, and Yekutieli correction (Benjamini and Yekutieli, 2001). Comparisons with a q-value < 0.05 are marked with an asterisk.
Figure 4
Figure 4. Histone modification-based features
Histone modification-based features are plotted for the eight TFs for which a histone modification feature was selected as important. Red bars represent the average log ratio of the given histone modification within a 200-bp window centered at bound sites. Blue bars represent the average value of the given nucleosome-based feature within a 200- bp window centered at unbound sites. P-values were calculated using the Wilcox rank sum test, and corrected for multiple testing using the Benjamini, Hochberg, and Yekutieli correction (q-values) (Benjamini and Yekutieli, 2001). Comparisons with a q-value < 0.05 are marked with a asterisk.

Similar articles

Cited by

References

    1. Andrews BJ, Moore LA. Interaction of the yeast Swi4 and Swi6 cell cycle regulatory proteins in vitro. Proceedings of the National Academy of Sciences, USA. 1992;89:11852–6. - PMC - PubMed
    1. Badis G, Chan ET, van Bakel H, Pena-Castillo L, Tillo D, Tsui K, Carlson CD, Gossett AJ, Hasinoff MJ, Warren CL, Gebbia M, Talukder S, Yang A, Mnaimneh S, Terterov D, Coburn D, Li Yeo A, Yeo ZX, Clarke ND, Lieb JD, Ansari AZ, Nislow C, Hughes TR. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Molecular Cell. 2008;32:878–87. - PMC - PubMed
    1. Bauer AL, Hlavacek WS, Unkefer PJ, Mu F. Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites. PLoS Computational Biology. 2010;6:e1001007. - PMC - PubMed
    1. Bean JM, Siggia ED, Cross FR. High functional overlap between MluI cell-cycle box binding factor and Swi4/6 cell-cycle box binding factor in the G1/S transcriptional program in Saccharomyces cerevisiae. Genetics. 2005;171:49–61. - PMC - PubMed
    1. Benjamini Y, Yekutieli D. The Control of the False Discovery Rate in Multiple Testing under Dependency. The Annals of Statistics. 2001;29:1165–1188.

Publication types

MeSH terms

LinkOut - more resources