Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 19:15:208.
doi: 10.1186/1471-2164-15-208.

TACO: a general-purpose tool for predicting cell-type-specific transcription factor dimers

Affiliations

TACO: a general-purpose tool for predicting cell-type-specific transcription factor dimers

Aleksander Jankowski et al. BMC Genomics. .

Abstract

Background: Cooperative binding of transcription factor (TF) dimers to DNA is increasingly recognized as a major contributor to binding specificity. However, it is likely that the set of known TF dimers is highly incomplete, given that they were discovered using ad hoc approaches, or through computational analyses of limited datasets.

Results: Here, we present TACO (Transcription factor Association from Complex Overrepresentation), a general-purpose standalone software tool that takes as input any genome-wide set of regulatory elements and predicts cell-type-specific TF dimers based on enrichment of motif complexes. TACO is the first tool that can accommodate motif complexes composed of overlapping motifs, a characteristic feature of many known TF dimers. Our method comprehensively outperforms existing tools when benchmarked on a reference set of 29 known dimers. We demonstrate the utility and consistency of TACO by applying it to 152 DNase-seq datasets and 94 ChIP-seq datasets.

Conclusions: Based on these results, we uncover a general principle governing the structure of TF-TF-DNA ternary complexes, namely that the flexibility of the complex is correlated with, and most likely a consequence of, inter-motif spacing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Known dimeric DNA-binding transcription factor complexes, manually compiled from the existing biochemical literature. For the complexes predicted in UW DNase-seq data (Figure  2B), their sequence motifs identified by TACO are shown. The remaining motifs were compiled as spacing alterations of TACO predictions or juxtaposed TRANSFAC monomers.
Figure 2
Figure 2
Data sources, comparison of TF dimer predictions and dimer prediction algorithms. (A) DNase-seq data sources. (B) Comparison of TF dimer predictions obtained using UW and Duke DNase-seq data. The Venn diagram illustrates the overlap between the two sets and also the set of known DNA-binding TF dimers manually compiled from the existing biochemical literature (Figure  1). (C) Comparison of dimer prediction algorithms. SpaMo and iTFs were evaluated both with and without motif trimming. Note that TACO does not require motif trimming. Sensitivity is shown as a function of false positive rate; Area Under Curve (AUC) is indicated.
Figure 3
Figure 3
Top 10 predicted motif dimers in Duke DNase-seq data, ranked by p-value.Left column: for each prediction, the enriched cell type, number of motif complex instances in cell-type-specific hypersensitive sites and p-value are indicated. Middle column: below each dimer motif, binding sites for individual motifs are indicated. Only the structure of the cluster seed is shown. For clarity, we have manually interpreted the motif annotations. Right column: literature citation on predicted TF dimer.
Figure 4
Figure 4
Top 10 predicted motif dimers in K562 ChIP-seq peaks, ranked by p-value.Left column: for each prediction, the names of enriched ChIP-seq datasets, followed by the number of motif complex instances and p-value in most significantly enriched dataset. Right column: below each dimer motif, the locations and names of underlying individual motifs are indicated for the top 5 overrepresented motif complexes. Red motifs correspond to the TF immunoprecipitated in an enriched ChIP-seq dataset, whereas blue motifs originate from TRANSFAC or other ChIP-seq datasets. For clarity, the red lines were drawn only once if the corresponding motif was shared across all 5 complexes.
Figure 5
Figure 5
Dynamic landscape of predicted TF dimers across cell types. Each column of the heatmap represents a motif dimer predicted in UW DNase-seq data in more than one cell type. Dimers predicted only in a single cell type are not shown. Color intensity indicates the motif complex enrichment p-value in the given cell type. Rows and columns were clustered using complete linkage method with binary metric.
Figure 6
Figure 6
Wide range of motif spacings for TF dimers predicted in K562 cells. Predicted dimers that varied only in their spacing (same motif pair and orientation) were grouped together and ranked by the p-value of the most significant spacing. For each such group of dimer predictions in K562 ChIP-seq peaks, we show the motif complex enrichment p-value as a function of motif spacing. Spacings to the left of the red line correspond to overlapping motifs.
Figure 7
Figure 7
Positive association between average motif spacing and flexibility of motif dimers.Left column: predictions in K562 ChIP-seq peaks, right column: combined predictions from UW and Duke DNase-seq data. Upper row: sunflower plots show the number of predicted motif spacings for a group of dimer predictions as a function of the average of their motif spacings. In case of data points occurring more than once, their count is indicated by the number of petals (orange lines). Lower row: sunflower plots show the standard deviation of predicted motif spacings as a function of average motif spacing. The Pearson correlation coefficients are shown for all plots.

References

    1. Friedman PN, Chen X, Bargonetti J, Prives C. The p53 protein is an unusually shaped tetramer that binds directly to DNA. Proc Natl Acad Sci USA. 1993;15:3319–3323. doi: 10.1073/pnas.90.8.3319. - DOI - PMC - PubMed
    1. Chen FE, Huang DB, Chen YQ, Ghosh G. Crystal structure of p50/p65 heterodimer of transcription factor NF-kappaB bound to DNA. Nature. 1998;15:410–413. doi: 10.1038/34956. - DOI - PubMed
    1. De Masi F, Grove CA, Vedenko A, Alibés A, Gisselbrecht SS, Serrano L, Bulyk ML, Walhout AJM. Using a structural and logics systems approach to infer bHLH-DNA binding specificity determinants. Nucleic Acids Res. 2011;15:4553–4563. doi: 10.1093/nar/gkr070. - DOI - PMC - PubMed
    1. Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, Loh Y-H, Yeo HC, Yeo ZX, Narang V, Govindarajan KR, Leong B, Shahab A, Ruan Y, Bourque G, Sung W-K, Clarke ND, Wei C-L, Ng H-H. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008;15:1106–1117. doi: 10.1016/j.cell.2008.04.043. - DOI - PubMed
    1. Wang D, Garcia-Bassets I, Benner C, Li W, Su X, Zhou Y, Qiu J, Liu W, Kaikkonen MU, Ohgi KA, Glass CK, Rosenfeld MG, Fu X-D. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature. 2011;15:390–394. doi: 10.1038/nature10006. - DOI - PMC - PubMed

Publication types

LinkOut - more resources