Comparative Study

. 2012 Sep 10;506(1):125-34.

doi: 10.1016/j.gene.2012.06.005. Epub 2012 Jun 10.

Differences in local genomic context of bound and unbound motifs

Loren Hansen¹, Leonardo Mariño-Ramírez, David Landsman

Affiliations

Affiliation

¹ Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8900 Rockville Pike, Bethesda, MD 20894, USA.

PMID: 22692006
PMCID: PMC3412921
DOI: 10.1016/j.gene.2012.06.005

Comparative Study

Differences in local genomic context of bound and unbound motifs

Loren Hansen et al. Gene. 2012.

. 2012 Sep 10;506(1):125-34.

doi: 10.1016/j.gene.2012.06.005. Epub 2012 Jun 10.

Authors

Loren Hansen¹, Leonardo Mariño-Ramírez, David Landsman

Affiliation

¹ Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8900 Rockville Pike, Bethesda, MD 20894, USA.

PMID: 22692006
PMCID: PMC3412921
DOI: 10.1016/j.gene.2012.06.005

Abstract

Understanding gene regulation is a major objective in molecular biology research. Frequently, transcription is driven by transcription factors (TFs) that bind to specific DNA sequences. These motifs are usually short and degenerate, rendering the likelihood of multiple copies occurring throughout the genome due to random chance as high. Despite this, TFs only bind to a small subset of sites, thus prompting our investigation into the differences between motifs that are bound by TFs and those that remain unbound. Here we constructed vectors representing various chromatin- and sequence-based features for a published set of bound and unbound motifs representing nine TFs in the budding yeast Saccharomyces cerevisiae. Using a machine learning approach, we identified a set of features that can be used to discriminate between bound and unbound motifs. We also discovered that some TFs bind most or all of their strong motifs in intergenic regions. Our data demonstrate that local sequence context can be strikingly different around motifs that are bound compared to motifs that are unbound. We concluded that there are multiple combinations of genomic features that characterize bound or unbound motifs.

Published by Elsevier B.V.

PubMed Disclaimer

Figures

**Figure 1. Correlation between motif strength and p-value of binding**
(a) Plotted is the mean p-value of binding for intergenic regions whose average motif strength was >80% of the maximum possible log-likelihood score. The p-value of binding was obtained from (Harbison et al., 2004). The number above each bar is the information content for the given motif in bits. The smaller the information content, the more likely that motif is to occur by random chance in a sequence. (b) For every motif, the average p-value of binding in intergenic regions containing high scoring motifs was calculated as described above (y-axis). The x-axis is the information content of the motifs in bits. (c and d) Plots of the p-value of binding versus motif strength for (c) ABF1 and (d) SUT1. The x-axis denotes the motif strength of a given TF as a percentage of the maximum possible PWM log-likelihood score. Higher motif strength correlates with closer proximity to the consensus sequence. The average p-value of binding for the collected intergenic regions that met the given motif strength threshold was calculated (y-axis). ABF1 and SUT1 were plotted because they represent the two extremes.

**Figure 2. TA dinucleotide content around bound or unbound motifs**
Motifs classified as bound or unbound were aligned. The TA dinucleotide content was binned in 50-bp windows moving upstream and downstream from the motif. Zero on the x-axis represents the center of the aligned motif. Black: The average percentage of TA, which is defined as the fraction of dinucleotides that are TA within each 50 bp window. Green: The background TA content calculated by randomly selecting locations in intergenic regions and repeating the procedure as described.

**Figure 3. Motifs enriched near bound or unbound motifs**
The fraction of bound (red) or unbound (blue) motifs that exhibit at least one of the labeled motifs within 100 bp is plotted for the nine TFs shown. p-values were calculated using the z-test for two proportions, and corrected for multiple testing using Benjamini, Hochberg, and Yekutieli correction (Benjamini and Yekutieli, 2001). Comparisons with a q-value < 0.05 are marked with an asterisk.

**Figure 4. Histone modification-based features**
Histone modification-based features are plotted for the eight TFs for which a histone modification feature was selected as important. Red bars represent the average log ratio of the given histone modification within a 200-bp window centered at bound sites. Blue bars represent the average value of the given nucleosome-based feature within a 200- bp window centered at unbound sites. P-values were calculated using the Wilcox rank sum test, and corrected for multiple testing using the Benjamini, Hochberg, and Yekutieli correction (q-values) (Benjamini and Yekutieli, 2001). Comparisons with a q-value < 0.05 are marked with a asterisk.

See this image and copyright information in PMC

References

1. Andrews BJ, Moore LA. Interaction of the yeast Swi4 and Swi6 cell cycle regulatory proteins in vitro. Proceedings of the National Academy of Sciences, USA. 1992;89:11852–6. - PMC - PubMed
1. Badis G, Chan ET, van Bakel H, Pena-Castillo L, Tillo D, Tsui K, Carlson CD, Gossett AJ, Hasinoff MJ, Warren CL, Gebbia M, Talukder S, Yang A, Mnaimneh S, Terterov D, Coburn D, Li Yeo A, Yeo ZX, Clarke ND, Lieb JD, Ansari AZ, Nislow C, Hughes TR. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Molecular Cell. 2008;32:878–87. - PMC - PubMed
1. Bauer AL, Hlavacek WS, Unkefer PJ, Mu F. Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites. PLoS Computational Biology. 2010;6:e1001007. - PMC - PubMed
1. Bean JM, Siggia ED, Cross FR. High functional overlap between MluI cell-cycle box binding factor and Swi4/6 cell-cycle box binding factor in the G1/S transcriptional program in Saccharomyces cerevisiae. Genetics. 2005;171:49–61. - PMC - PubMed
1. Benjamini Y, Yekutieli D. The Control of the False Discovery Rate in Multiple Testing under Dependency. The Annals of Statistics. 2001;29:1165–1188.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- BioCyc
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Differences in local genomic context of bound and unbound motifs

Affiliation

Differences in local genomic context of bound and unbound motifs

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases