Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep;22(9):1735-47.
doi: 10.1101/gr.136366.111.

Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements

Affiliations

Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements

Anshul Kundaje et al. Genome Res. 2012 Sep.

Abstract

Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of the steps followed by CAGT in order to group the signal profiles around a set of genomic features into distinct and coherent clusters. The steps are illustrated using H3K27ac signal profiles around CTCF binding sites in the K562 cell line. (1) We start by extracting the H3K27ac signal intensity profiles in a window (±500 bp) around each feature (CTCF binding site) and aligning all signals at the core of the feature (summit of the CTCF peak). The grayscale plot at the bottom is a traditional aggregation plot obtained by averaging all signal profiles. The bold line is the mean intensity, while the shaded area around it corresponds to the 10th and 90th percentiles of the signal. (2) The sites are divided into high and low signals based on the peak intensity of each H3K27ac signal profile around each site. (3) High signal sites are standardized to zero mean and unit standard deviation and clustered with the k-medians algorithm. This step typically leads to a large number of compact clusters, some of which may be redundant with similar average patterns. (4) In the final step, similar clusters, as well as clusters that are mirror images of each other, are merged using hierarchical agglomerative clustering, resulting in a small number of distinct, nonredundant, compact clusters (see Methods for details).
Figure 2.
Figure 2.
(A) Nucleosome positioning patterns around TSSs in K562. The first panel is a traditional aggregation plot of the nucleosome positioning signal in a window of size 1001 bp centered on each of 15,736 GENCODE TSSs. The bold line is the mean signal across all TSSs, while the shaded area around it corresponds to the 10th and 90th percentiles. The rest of the panels show the patterns uncovered by CAGT, ordered by the percentage of TSSs that follow each pattern. Patterns corresponding to <2% of TSSs are not shown. All TSSs are reoriented so that the direction of transcription is from left to right. Plots are colored according to the third quartile of the expression of TSSs in the corresponding cluster, as measured by CAGE tags. (B) Box-plots of the expression of TSSs following each of the patterns shown in A.
Figure 3.
Figure 3.
(A) Distribution of distances between the TF binding site and the closest nucleosome for all TFs assayed in both GM12878 and K562. For each TF in each cell line, we used the median signal of the clusters to compute the distance between the TF binding site and the closest nucleosome positioning peak. The area of each dot is proportional to the fraction of peaks of the TF with the given distance between the binding site and the closest nucleosome dyad. The vertical line extends from the first to the third quartile of distances for each TF. (B) Nucleosome positioning patterns uncovered by CAGT around REST binding sites in K562. (Top, left) A traditional aggregation plot, averaging the signal over all 14,144 REST sites. The rest of the panels show the CAGT clusters in order of prevalence, with the percentage of REST peaks in each shown in the header. Two clusters containing <2% of REST peaks each are omitted from the figure. Note the large diversity of nucleosome positioning shapes, with distances between the binding site and the closest nucleosome positioning peak varying widely from 10 bp (P_17) to 300 bp (P_4).
Figure 4.
Figure 4.
Examples of nucleosome positioning clusters around TFBSs and relationship to GC content. For each TF, the first panel of the top row is a traditional aggregation plot, where the signal is averaged over all sites. The total number of sites is shown in the header. The remaining panels of the top row show the mean nucleosome positioning signal in the five largest clusters discovered by CAGT, with the fraction of peaks in each cluster shown in the header. Each panel in the second row shows the mean GC content of all sites used in the panel above it. If a site was “flipped” during the last step of CAGT (see Fig. 1), then the corresponding GC signal was also flipped accordingly. GC content was computed using a sliding window of 21 bp. The small arrows indicate container sites. (A) SPI1 in GM12878; (B) TCF12 in GM21878; (C) EGR1 in K562.
Figure 5.
Figure 5.
Widespread asymmetry of chromatin marks around TFBSs. (AF) Fraction of TF peaks with asymmetric patterns for each chromatin mark. For each combination of TF and mark, we computed the fraction of high signal binding sites in asymmetric CAGT clusters. Results were averaged over all available data sets for the same TF and mark in all cell lines. Some examples for factors that contribute to the specific data point are shown, with arrows pointing to the asymmetry fraction of the factor. For example, in ∼85% of NRF1 binding sites with high H3K9ac signal, the shape of the modification is asymmetric around the binding site. (A) DNase and nucleosome positioning and their contrasting asymmetry frequency distributions. (B) Gene body marks. (C) Promoter-associated marks. (D) Enhancer-associated marks. (E) Repressive marks that exhibited moderate signal around binding sites. (F) Repressive marks that exhibited generally weak signal around binding sites. (G) For each combination of TF and mark, we computed the number of proximal and distal binding sites in symmetric and asymmetric CAGT clusters and identified which one of the four groups, symmetric proximal, symmetric distal, asymmetric proximal, and asymmetric distal, contained the largest number of binding sites. Results were averaged over all available data sets for the same TF and mark in all cell lines. The height of each bar shows the number of TFs for which the corresponding group was the most prevalent. The “Missing” part corresponds to the TFs that were not assayed for that mark.
Figure 6.
Figure 6.
(A) Asymmetric nucleosome positioning meta-clusters across all TFs in GM12878 and K562. Clusters are numbered according to their size, and labeled with the approximate distance between the binding site and the center of the nearest well-positioned nucleosome. Each panel shows the mean signal over all binding sites (for all TFs and for both cell lines) that were assigned to that cluster. (B) The two symmetric nucleosome meta-clusters not shown in A. For each of these two clusters, we also show the mean signal of other chromatin marks averaged over the binding sites in that cluster. Sites in cluster 12 exhibit remarkably higher signals of active marks. For both clusters 7 and 12, the signal of the associated chromatin marks appears highly symmetric, but this is an artifact of aggregating the chromatin mark signal according to the clustering and orientation of the nucleosome signal. (C) For each TF that was enriched in either cluster 7 or cluster 12 (P < 0.001), we computed the fraction of binding sites in each of these clusters. Cluster 7 is enriched for the members of the CTCF/cohesin complex, while cluster 12 is enriched for enhancer-associated TFs.
Figure 7.
Figure 7.
CAGT meta-clusters for all histone modifications across all binding proteins in all Tier 1 and Tier 2 ENCODE cell lines. Each row contains the clusters discovered by CAGT in the merged data sets for the corresponding modification. The clusters for each mark are numbered according to their size, with cluster 1 for each mark containing the most TFBSs (see the numbers at the top, left corner of each shape plot). Clusters for different modifications are arranged to bring similar shapes in the same column. Five columns containing three or fewer shapes are not shown.
Figure 8.
Figure 8.
(A,B) The top row shows the most prevalent nucleosome positioning clusters around SIN3A and SP1 sites, respectively, in GM12878. The remaining rows show the signal of histone modifications, averaged over all sites in the corresponding clusters. TSS-proximal TFs, such as SIN3A, exhibit correlated nucleosome positioning and histone modification patterns. Such correlations, however, are not evident for TFs that tend to bind more distally from TSSs (e.g., SP1). (C) Clusters of H3K4me1 signal around POLR2A sites in HepG2 and the corresponding H3K4me3 signal. There is a clear anticorrelation between the two histone marks. (D) For all CAGT runs around TFBSs, we considered all TSS-proximal sites that were assigned to asymmetric clusters, and counted how many times the direction of transcription of the TSS closest to a site agreed with (configuration (1)) or opposed (configuration (2)) the direction of the asymmetry pattern (from low to high signal) of the cluster to which the site was assigned. We are showing the log10-ratio of the two counts, aggregated over all CAGT runs for the same mark. Values >0 (corresponding to ratios >1) imply that the mark tends to increase in the direction of transcription, while values <0 imply that the mark tends to increase in the opposite direction.

References

    1. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K 2007. High-resolution profiling of histone methylations in the human genome. Cell 129: 823–837 - PubMed
    1. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi AM, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al. 2012. Landscape of transcription in human cells. Nature (in press). - PMC - PubMed
    1. The ENCODE Project Consortium 2012. An integrated encyclopedia of DNA elements in the human genome. Nature (in press). - PMC - PubMed
    1. Ernst J, Kellis M 2010. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol 28: 817–825 - PMC - PubMed
    1. Fu Y, Sinha M, Peterson CL, Weng Z 2008. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet 4: e1000138 doi: 10.1371/journal.pgen.1000138 - PMC - PubMed

Publication types

Associated data