Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct;29(10):1733-1743.
doi: 10.1101/gr.248658.119. Epub 2019 Sep 18.

Identifying clusters of cis-regulatory elements underpinning TAD structures and lineage-specific regulatory networks

Affiliations

Identifying clusters of cis-regulatory elements underpinning TAD structures and lineage-specific regulatory networks

Seyed Ali Madani Tonekaboni et al. Genome Res. 2019 Oct.

Abstract

Cellular identity relies on cell-type-specific gene expression controlled at the transcriptional level by cis-regulatory elements (CREs). CREs are unevenly distributed across the genome, giving rise to individual CREs and clusters of CREs (COREs). Technical and biological features hinder CORE identification. We addressed these issues by developing an unsupervised machine learning approach termed clustering of genomic regions analysis method (CREAM). CREAM automates CORE detection from chromatin accessibility profiles that are enriched in CREs strongly bound by master transcription regulators, proximal to highly expressed and essential genes, and discriminating cell identity. Although COREs share similarities with super-enhancers, we highlight differences in terms of the genomic distribution and structure of these cis-regulatory units. We further show the enhanced value of COREs over super-enhancers to identify master transcription regulators, highly expressed and essential genes defining cell identity. COREs enrich at topologically associated domain (TAD) boundaries. They are also preferentially bound by the chromatin looping factors CTCF and cohesin, in contrast to super-enhancers, forming clusters of CTCF and cohesin binding regions and defining homotypic clusters of transcription regulator binding regions (HCTs). Finally, we show the clinical utility of CREAM to identify COREs across chromatin accessibility profiles to stratify more than 400 tumor samples according to their cancer type and to delineate cancer type-specific active biological pathways. Collectively, our results support the utility of CREAM to delineate COREs underlying, with greater accuracy than individual CREs or super-enhancers, the cell-type-specific biological underpinning across a wide range of normal and cancer cell types.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic representation of the five main steps of the clustering of genomic regions analysis method (CREAM). For step 1, CREAM identifies all groups of two, three, four, and more neighboring CREs. The total number of CREs in a group defines its “Order.” Step 2 is identification of the maximum window size (MWS) between two neighboring CREs in group for each Order. The MWS corresponds to the greatest distance allowed between two neighboring CREs in a given cluster. Step 3 is identification of the maximum Order limit of COREs from a given data set. Step 4 is CORE reporting according to the criteria set in step 3 from the highest to the lowest Order. Step 5 is identification of the minimum Order limit of COREs based on the identified COREs in step 4.
Figure 2.
Figure 2.
Comparison of genomic characteristics of the COREs identified by CREAM versus individual CREs in the GM12878, K562, and H1-hESC cell lines. (A) Distribution of DNase I signal intensity in individual CREs and COREs (signal per base pair). (B) Expression level of genes in 10-kb proximity of individual CREs or COREs. (****) P-value <0.0001. (C) Median expression of genes according to distance to the closest individual CRE (gray) or CORE (red). (D) Volcano plot of significance (FDR) and effect size (essentiality score) of genes in proximity of CREAM-identified COREs in the K562 cell line (red indicates significant fold change; gray, insignificant fold change). (E) Essentiality score from K562, KBM-7, Jiyoye, and Raji cell lines for genes proximal (±10 kb) to COREs identified by CREAM in the K562 cell line. (****) P-value <0.0001 using Wilcoxon signed-rank test. (F) Expression level of essential genes associated with individual CREs versus COREs. (**) P-value <0.01.
Figure 3.
Figure 3.
Transcription regulator (TR) binding intensity in individual CREs and COREs. (A) Enrichment of TR binding intensity from ChIP-seq data in COREs identified by CREAM versus individual CREs from DNase-seq in the GM12878, K562, or H1-hESC cell lines. Volcano plots represent −log10(FDR) versus log2(fold change [FC]) in ChIP-seq signal intensities. Each dot is one TR (colored indicates significant FC; gray, insignificant FC). The barplots show how many TRs have higher signal intensity in COREs or individual CREs (FDR < 0.001 and log2[FC] > 1). FC is defined as the ratio between the average signal per base pair in COREs versus individual CREs. (B) Distribution of ChIP-seq signal intensity at COREs and individual CREs for TCF3 and EBF1 as examples of master TRs in GM12878, for GABPA and CREB1 as examples of master TRs in the K562 cell line, and for NANOG and MYC as examples of master TRs in the H1-hESC cell line. (C) Examples of genomic regions with COREs (with different coverage) occupied by TRs presented in B.
Figure 4.
Figure 4.
Arrangement of COREs and individual CREs with respect to TAD boundaries. (A) Schematic representation of TAD boundaries and intra-TAD regions (25-kb Hi-C resolution). (B) Comparison of fraction of COREs and individual CREs from DNase-seq that lie at TAD boundaries with increasing distance from TAD-boundary cutoffs in the GM12878 and K562 cell lines. (C) Enrichment of TR binding intensities within COREs over individual CREs that lie in proximity of TAD boundaries (±10 kb) versus COREs and CREs farther away from TAD boundaries (intra-TAD elements) in the GM12878 or K562 cell line. (D) Enrichment of TR binding intensity in COREs proximal to TAD boundaries (±10 kb) versus intra-TAD domains. (E) Fraction of HCTs (purple) and individual TR binding regions (gray) at TAD boundaries (±10 kb). The total number of individual binding regions for each TR in the GM12878 and K562 cell lines is also reported (orange). (F) Examples of HCTs for CTCF, RAD21, SMC3, and ZNF143 at the TAD boundary for the MYC and BCL6 genes (10-kb Hi-C resolution).
Figure 5.
Figure 5.
Comparison of CREAM-identified COREs and super-enhancers of the GM12878, K562, and H1-hESC cell lines. (A) Similarity of COREs and super-enhancers based on their genomic loci overlap. (B) Top five enriched biological pathways using genes in 10-kb proximity of the identified COREs and super-enhancers in each one of the GM12878, K562, and H1-hESC cell lines. (C) Percentage of COREs and super-enhancers containing two or more individual CREs. (D) Expression of genes in 10-kb proximity of both COREs and super-enhancers or exclusively in proximity of COREs or super-enhancers. (E) Enrichment of essential genes among genes in proximity of both COREs and super-enhancers or exclusively in proximity of COREs or super-enhancers. (F) Enrichment of TR binding intensity from ChIP-seq data in COREs identified by CREAM versus super-enhancers. Volcano plots represent −log10(FDR) versus log2(FC) in ChIP-seq signal intensities. Each dot is one TR (colored indicates significant FC; gray, insignificant FC). The barplots show how many TRs have higher signal intensity in COREs or super-enhancers (FDR < 0.001 and log2[FC] > 1). FC is defined as the ratio between the average signal per base pair in COREs versus super-enhancers. (G) Distribution of ChIP-seq signal intensity of CTCF at COREs and super-enhancers in 10-kb proximity of TAD boundaries.
Figure 6.
Figure 6.
Biology of COREs in human tumor samples. (A) Balanced accuracy for classification of TCGA tumor samples based on their tissue of origin using CREAM-identified COREs. (B) Enrichment of highly expressed genes in proximity (±10 kb) of CREAM-identified COREs versus individual CREs for TCGA tumor samples. Boxplots show the null distribution corresponding to expression of randomly selected genes, and each dot corresponds to the expression of proximal genes to COREs for each tumor sample in TCGA. (C) Enrichment of hallmark gene sets relying on genes in proximity (±10 kb) of COREs versus genes in proximity (±10 kb) of individual CREs for TCGA tumor samples.

Similar articles

Cited by

References

    1. Bailey SD, Zhang X, Desai K, Aid M, Corradin O, Lari RC-S, Akhtar-Zaidi B, Scacheri PC, Haibe-Kains B, Lupien M. 2015. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat Commun 6: 6186 10.1038/ncomms7186 - DOI - PMC - PubMed
    1. Boeva V, Louis-Brennetot C, Peltier A, Durand S, Pierre-Eugène C, Raynal V, Etchevers HC, Thomas S, Lermine A, Daudigeos-Dubus E, et al. 2017. Heterogeneity of neuroblastoma cell identity defined by transcriptional circuitries. Nat Genet 49: 1408–1413. 10.1038/ng.3921 - DOI - PubMed
    1. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10: 1213–1218. 10.1038/nmeth.2688 - DOI - PMC - PubMed
    1. Chipumuro E, Marco E, Christensen CL, Kwiatkowski N, Zhang T, Hatheway CM, Abraham BJ, Sharma B, Yeung C, Altabef A, et al. 2014. CDK7 inhibition suppresses super-enhancer-linked oncogenic transcription in MYCN-driven cancer. Cell 159: 1126–1139. 10.1016/j.cell.2014.10.024 - DOI - PMC - PubMed
    1. Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, Silva TC, Groeneveld C, Wong CK, Cho SW, et al. 2018. The chromatin accessibility landscape of primary human cancers. Science 362: eaav1898 10.1126/science.aav1898 - DOI - PMC - PubMed

Publication types

LinkOut - more resources