Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 1;40(17):e128.
doi: 10.1093/nar/gks433. Epub 2012 May 18.

Inferring direct DNA binding from ChIP-seq

Affiliations

Inferring direct DNA binding from ChIP-seq

Timothy L Bailey et al. Nucleic Acids Res. .

Abstract

Genome-wide binding data from transcription factor ChIP-seq experiments is the best source of information for inferring the relative DNA-binding affinity of these proteins in vivo. However, standard motif enrichment analysis and motif discovery approaches sometimes fail to correctly identify the binding motif for the ChIP-ed factor. To overcome this problem, we propose 'central motif enrichment analysis' (CMEA), which is based on the observation that the positional distribution of binding sites matching the direct-binding motif tends to be unimodal, well centered and maximal in the precise center of the ChIP-seq peak regions. We describe a novel visualization and statistical analysis tool--CentriMo--that identifies the region of maximum central enrichment in a set of ChIP-seq peak regions and displays the positional distributions of predicted sites. Using CentriMo for motif enrichment analysis, we provide evidence that one transcription factor (Nanog) has different binding affinity in vivo than in vitro, that another binds DNA cooperatively (E2f1), and confirm the in vivo affinity of NFIC, rescuing a difficult ChIP-seq data set. In another data set, CentriMo strongly suggests that there is no evidence of direct DNA binding by the ChIP-ed factor (Smad1). CentriMo is now part of the MEME Suite software package available at http://meme.nbcr.net. All data and output files presented here are available at: http://research.imb.uq.edu.au/t.bailey/sd/Bailey2011a.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Confirming the in vivo DNA-binding affinity of NFIC. The top five CentriMo results for all JASPAR CORE and UniProbe mouse motifs (a) and their sequence logos (b) are shown. Each curve is the density (averaged over bins of width 10 bp) of the best strong site (score ≥5 bits) for the named motif at each position in the NFIC ChIP-seq (500 bp) peak regions from mouse EF cells. The legend shows the motif, its central enrichment p-value, the width of the most enriched central region (w), and the number of ChIP-seq regions (n out of 39807) that contain a motif site. JASPAR motifs MA0119.1 and MA0161.1 are known NFIC and NFIC half-site motifs, respectively.
Figure 2.
Figure 2.
Inferring the DNA-binding affinity of Nanog in mouse ES cells. The top five CentriMo results for motifs discovered by DREME (a) and for consensus motifs similar to the SELEX-derived motif (b) are shown. Each curve shows the density (averaged over bins of width 10bp) of the best strong site (score ≥5 bits) for the named motif at each position in the Nanog (500bp) ChIP-seq peak regions. The legend shows the motif, its central enrichment p-value, the width of the most enriched central region (w), and the number of peaks (n out of 10343) that contain a motif site. The two CVATYA motifs (shown as sequence logos in the insets) differ slightly because the one in (a) is the PWM motif found by DREME, and the one in (b) is based solely on the consensus sequence.
Figure 3.
Figure 3.
Central enrichment of two novel Nanog motifs (and two variants) in mouse ES cells. CentriMo results for the CVATYA motif discovered by DREME, the novel Nanog motif reported by He et al. (24), and two variants of the DREME motif are shown. The inset shows the aligned sequence logos of the two novel motifs. Each curve shows the density (averaged over bins of width 10 bp) of the best strong site (score ≥5 bits) for the named motif at each position in the (500 bp) Nanog ChIP-seq peak regions. The legend shows the motif, its central enrichment p-value, the width of the most enriched central region (w), and the number of peaks (n out of 10343) that contain a motif site. Note that the CVATYA motif used is based on the consensus sequence, not the PWM reported by DREME, and is the same motif as used in Figure 2b.
Figure 4.
Figure 4.
Central enrichment of E2f-family motifs and a other motifs in mouse ES cells. CentriMo results for the most enriched motif and the most enriched E2F-family motif in JASPAR/UniPROBE (a), and centrally enriched JASPAR motifs with narrow enrichment regions (b). Each curve in (a) shows the density (averaged over bins of width 10 bp) of the best strong site (score ≥5 bits) for the named motif at each position in the (500 bp) E2F1 ChIP-seq peak regions. The legend shows the motif, its central enrichment p-value, the width of the most enriched central region (w), and the number of peaks (n out of 20 699) that contain a motif site. YY1 and E2f1 rank first and second in terms of central enrichment among significant (P <0.05) JASPAR/UniPROBE motifs for which CentriMo predicts a central enrichment window narrower than 125 bp.
Figure 5.
Figure 5.
No evidence of direct binding by Smad1 in mouse ES cells. Results of CentriMo MEA analysis using the JASPAR+UniPROBE motifs (a) and using just the in vitro Smad3 motifs from UniPROBE (b) on the Chen et al. (15) ChIP-seq data. Each curve shows the density (averaged over bins of width 10 bp) of the best strong site (score ≥5 bits) for the named motif at each position in the (500 bp) Smad1 ChIP-seq peak regions. The legend shows the motif, its central enrichment p-value, the width of the most enriched central region (w), and the number of peaks (n out of 1126) that contain a motif site. JASPAR motifs MA0142.1 and MA0143.1 are for transcription factors Pou5f1 (Oct4) and Sox2, respectively.
Figure 6.
Figure 6.
Comparison of CentriMo and AME for MEA of mouse ES cell ChIP-seq. Table shows results of applying CentriMo and AME to 10 ChIP-seq data sets from Chen et al. (15) using all 532 JASPAR CORE vertebrate and UniPROBE mouse motifs. Rows show: (a) the name of the ChIP-ed TF, (b) the CentriMo site-probability curves for the five most centrally enriched motifs, and the most enriched motif according to (c) CentriMo or, (d) AME. JASPAR motifs are given as TF name (JASPAR ID). We note that: the second-ranking CentriMo motif for Oct4 is Pou2f3_3986, an in vitro Oct motif; the second-ranking CentriMo motif for Sox2 is Sox11_primary, an in vitro Sox motif; the second-ranking AME motif for STAT3 is STAT3 (MA0144.1); and, the second-ranking AME motif for Zfx is Zfx (MA0146.1).

References

    1. Wilbanks EG, Facciotti MT. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS One. 2010;5:e11471. - PMC - PubMed
    1. Whitington T, Frith MC, Johnson J, Bailey TL. Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Res. 2011;39:e98. - PMC - PubMed
    1. Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–1659. - PMC - PubMed
    1. Stormo GD. Information content and free energy in DNA–protein interactions. J. Theor. Biol. 1998;195:135–137. - PubMed
    1. Jothi R, Cuddapah S, Barski A, Cui K, Zhao K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 2008;36:5221–5231. - PMC - PubMed

Publication types