Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 15:15:27.
doi: 10.1186/1471-2164-15-27.

T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets

Affiliations

T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets

Yuanyuan Li et al. BMC Genomics. .

Abstract

Background: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however.

Results: To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful.

Conclusions: T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic "hot spots" where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An example of one constitutive and one non-constitutive CTCF binding site in a restricted region on chromosome X. Each track (for the cell line indicated on the left) displays the CTCF binding profile in the region using UCSC big wiggle format. The locus to the right of center where a CTCF binding peak appears in all cell lines displayed would be declared constitutive whereas the locus to the left of center where a binding peak is present less often would be non-constitutive.
Figure 2
Figure 2
A schematic overview of T-KDE.As input, T-KDE uses the locations of peak centers (defined by chromosome and coordinate), not the sequence reads. Step 1: order the peak centers for a TF from all cell lines together and partition them into subsets (terminal nodes) using a binary range tree algorithm. Solid circles indicate terminal nodes. Step 2: apply KDE to estimate a density function for each terminal node. Horizontal lines represent ChIP-seq peaks with dots indicating their centers. The blue curve is the estimated density function. Step 3: apply a mode finding algorithm to each terminal node’s density estimate to identify the modal regions associated with each local maximum. The density function shown has four local maxima (the rightmost two almost coincide); a horizontal red bar marks the constitutive modal region and seven vertical lines mark boundaries of the modal regions.
Figure 3
Figure 3
Performance of T-KDE and binning. (A) Proportion of T-KDE-declared constitutive CTCF binding sites whose distance from the nearest motif-based constitutive CTCF binding site on 23 chromosomes is less than distance d plotted as a function of d for various bandwidths. (B) Proportion of binning-declared constitutive CTCF binding sites whose distance from nearest motif-based constitutive CTCF binding site on 23 chromosomes is less than distance d plotted as a function of d for various bin widths.
Figure 4
Figure 4
Proportion of T-KDE declared versus bin declared constitutive CTCF binding sites in the entire genome whose distance from nearest motif-based constitutive CTCF binding site are less than distance d plotted as a function of d. Separate curves for T-KDE with bandwidth of 100 bp and bin with size of 400 bp.
Figure 5
Figure 5
Motif logos of CTCF motif variants in comparison to the canonical core CTCF motif. The regions that are aligned to the core CTCF motif are highlighted in red and blue boxes.

References

    1. Schmidt D, Schwalie PC, Ross-Innes CS, Hurtado A, Brown GD, Carroll JS, Flicek P, Odom DT. A CTCF-independent role for cohesin in tissue-specific transcription. Genome Res. 2010;15(5):578–588. doi: 10.1101/gr.100479.109. - DOI - PMC - PubMed
    1. Wang H, Maurano MT, Qu H, Varley KE, Gertz J, Pauli F, Lee K, Canfield T, Weaver M, Sandstrom R. et al.Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res. 2012;15(9):1680–1688. doi: 10.1101/gr.136101.111. - DOI - PMC - PubMed
    1. Li Y, Huang W, Niu L, Umbach DM, Covo S, Li L. Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes. BMC Genomics. 2013;15(1):553. doi: 10.1186/1471-2164-14-553. - DOI - PMC - PubMed
    1. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;15(7398):376–380. doi: 10.1038/nature11082. - DOI - PMC - PubMed
    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;15(1):16–23. doi: 10.1093/bioinformatics/16.1.16. - DOI - PubMed

Publication types

LinkOut - more resources