Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 2;23(2):276-288.e8.
doi: 10.1016/j.stem.2018.06.014. Epub 2018 Jul 19.

Functional Dissection of the Enhancer Repertoire in Human Embryonic Stem Cells

Affiliations

Functional Dissection of the Enhancer Repertoire in Human Embryonic Stem Cells

Tahsin Stefan Barakat et al. Cell Stem Cell. .

Abstract

Enhancers are genetic elements that regulate spatiotemporal gene expression. Enhancer function requires transcription factor (TF) binding and correlates with histone modifications. However, the extent to which TF binding and histone modifications functionally define active enhancers remains unclear. Here, we combine chromatin immunoprecipitation with a massively parallel reporter assay (ChIP-STARR-seq) to identify functional enhancers in human embryonic stem cells (ESCs) genome-wide in a quantitative unbiased manner. Although active enhancers associate with TFs, only a minority of regions marked by NANOG, OCT4, H3K27ac, and H3K4me1 function as enhancers, with activity markedly changing under naive versus primed culture conditions. We identify an enhancer set associated with functions extending to non-ESC-specific processes. Moreover, although transposable elements associate with putative enhancers, only some exhibit activity. Similarly, within super-enhancers, large tracts are non-functional, with activity restricted to small sub-domains. This catalog of validated enhancers provides a valuable resource for further functional dissection of the regulatory genome.

Keywords: ChIP-STARR-seq; H3K27ac; H3K4me1; NANOG; OCT4; genome-wide functional enhancer map; naive pluripotency; super-enhancers; transposable elements.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
ChIP-STARR-Seq in Human Embryonic Stem Cells (A) Outline of the ChIP-STARR-seq approach combining antibodies against TFs or histone modifications (colored balls) with the STARR-seq plasmid (Arnold et al., 2013). (B) ChIP-STARR-seq for NANOG in H9. Scatterplots compare normalized read count (reads per million) per peak between datasets, obtained from ChIP-seq or DNA-seq of plasmid libraries pre- or post-transfection/recovery from ESCs (n = 2); r, Pearson correlation. (C) Genomic distribution of peaks called for ChIP-seq (outer chart) and corresponding plasmid libraries (inner chart). TSSs, transcription start sites. (D) FACS plots of single DAPI-negative ESCs. Left: untransfected cells; right: cells transfected with a NANOG ChIP-STARR-seq plasmid library. (E) Scatterplot (like in B) comparing the NANOG plasmid library and corresponding ChIP-STARR-seq RNA. The dense cluster of points in the lower left corresponds to library plasmids that did not produce RNAs. RPM, reads per million. (F) Genome browser plot of SOX2 showing tracks for ChIP-seq, DNA-seq of plasmid libraries pre- and post-transfection, and from RNA-seq of GFP+ cells transfected with the indicated libraries. Bottom: combination (maximum) of all STARR-seq RNA-seq tracks and ratio of normalized RNA-seq/plasmid reads. (G) Genome browser shots of KLF15, LEFTY, and HOXB cluster, illustrating a broad variety of enhancers profiled in this functional enhancer catalog.
Figure 2
Figure 2
Activity Levels Define Functional Classes of Enhancers (A) Luciferase activities of 68 genomic sequences in primed ESCs grouped by ChIP-STARR-seq activity. Boxes are interquartile range (IQR); line is median; and whiskers are the 10th to the 90th percentile. p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001; Mann-Whitney test; n = 2. (B) Distribution of expression values (Takashima et al., 2014) of genes associated with enhancers grouped by activity level. Boxes are IQR; line is median; whiskers extend to 1.5× the IQR; and dots are outliers. ∗∗p < 0.01; ∗∗∗p < 0.001; unpaired t test. (C) Plot showing enhancer activity (enrichment of ChIP-STARR-seq RNA over plasmids; log2) ranked from lowest to highest across all measured enhancers (union of all peak calls). Enhancers were distinguished based on activity; dashed lines indicate thresholds (θ). (D) Distribution of active (RPP ≥ 138) and inactive sequences (RPP < 138) in peaks called for the indicated factors. (E) qRT-PCR analysis of wild-type (WT) and enhancer-deleted heterozygous (+/−) or homozygous (−/−) ESC clones. Indicated mRNAs are normalized to TBP (WT = 1), and the average results for the indicated deletions are plotted relative to wild-type; n = number of cell lines per genotype (see STAR Methods for further details). p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001 (two-way ANOVA, Bonferroni post-test). Error bars represent SD. (F) Relative enrichment of H9 chromatin segment overlaps (Kundaje et al., 2015) between regions with ChIP-STARR-seq activity and inactive regions (see C). (G) Relative LOLA enrichment of TFs from CODEX (Sánchez-Castillo et al., 2015) in inactive regions and active enhancers. Odds ratios between observed frequencies of enhancers overlapping binding sites for the eight most enriched TFs in the respective groups relative to the percentage in the entire region set are shown, ranked by mean odds ratio. Each dot represents a TF ChIP-seq dataset. ChIP-seq datasets from non-ESCs are shown as crosses. (H) Smooth line plots of the proportion of active plasmids (RPP ≥ 138) around the peak center for the indicated ChIP-seq binding sites.
Figure 3
Figure 3
Sequence Determinants of Enhancer Activity (A) Receiver operating characteristic (ROC) curve of the random forest classifier performance. AUC, area under the curve. (B) The top-40 sequence features used to distinguish active and inactive regions ordered by variable importance. HOCOMOCO motif IDs were shortened (Kulakovskiy et al., 2016). (C) Line plots of the percentage of regions containing one of the top-3 motifs from HOCOMOCO as a function of enhancer activity. Each point is the fraction of regions with at least log2(RPP+1) also containing the respective motif.
Figure 4
Figure 4
Active Enhancers Include Core and Extended ESC-Enhancer Modules (A) The overlap between published putative enhancers (Hawkins et al., 2011, Rada-Iglesias et al., 2011, Xie et al., 2013) (light blue) and regions assessed by ChIP-STARR-seq (white) or called active (RPP ≥ 138; blue). We refer to ChIP-STARR-seq enhancers overlapping published putative enhancers as the “core module” and non-overlapping regions as the “extended module.” (B) Kernel density plots of the distribution of enrichment values in ESCs for the indicated factor for peaks associated with the core or extended modules or for inactive regions. (C) RPP values for all assessed genomic regions compared to enhancers from the core or extended modules. Boxes are IQR; line is median; and whiskers extend to 1.5× the IQR. (D) RNA-seq in H9 (Takashima et al., 2014) for all genes compared to genes associated with either core or extended enhancer modules. Boxes like in (C). RPKM, reads per kilobase million. p < 0.05; ∗∗∗ p < 0.001 (t test). (E) Gene expression in tissues from the RNA-seq Atlas (Krupp et al., 2012) for all genes linked to the core or extended modules. Housekeeping (Eisenberg and Levanon, 2013) and tissue-specific genes (Lachmann et al., 2018) are also shown. Tissue-specific genes are split into the one indicated (same; x axis) or “other tissues.” As no tissue-specific gene set was available for hypothalamus, whole-brain-specific genes were used. Boxes like in D. (F) Enrichment analysis (Enrichr) testing genes associated with the core (top) and extended (bottom) modules. Top-10 results for TF binding sites from ENCODE and ChEA (left) and genes downregulated (middle) or upregulated (right) upon single-gene perturbations from GEO. (G) Relative enrichment (log-odds ratio in ESCs compared to all) of H9 chromatin segments (Kundaje et al., 2015) in core and extended module enhancers. (H) Kernel density plot of the distance to associated genes for core and extended module enhancers. Shortest distance from either enhancer region boundary was recorded.
Figure 5
Figure 5
Changes in Enhancer Activity upon Induction of Naive Pluripotency (A) Overview of primed to naive conversion and ChIP-STARR-seq cross-over design. (B) Relative enrichment of TFs from CODEX (Sánchez-Castillo et al., 2015) in inactive, and active enhancers in naive hESCs. Plots like in Figure 2G. (C) Table of relative changes in enhancer activity between primed and naive ESCs. (D) Enrichment analysis (Enrichr) to test genes near enhancers active in both primed and naive ESCs against GO assignments (left) or binding sites from ENCODE and ChEA ChIP-seq (right). (E) Scatterplot contrasting average changes in enhancer activity with changes in associated gene expression. Genes with strong concordant changes in enhancer activity and gene expression are shown using the thresholds: |max(ΔRPP)| ≥ 5, |mean(ΔmRNA)| ≥ 1. (F) Visualization of enhancer activity in ChIP-STARR-seq regions near selected genes (boxes in E; TSS ± 40 kb) with differential expression in primed and naive ESCs. Bars indicate enhancer activity (RPP) in primed (blue) and naive (red) ESCs. Grey dashed bars indicate activity threshold for active enhancers. Active enhancers are highlighted with asterisks. Gene name color shows the state expressing the gene the highest. (G) Scatterplot of scaled variable importance of sequence features used to discriminate active and inactive regions in primed and naive ESCs. In both cases, a random forest classifier was trained.
Figure 6
Figure 6
Distinct Transposable Elements Are Associated with Enhancers of Differing Activity in ESCs (A) Enrichment ratios for the occurrence of TE families (LTR, DNA, SINE, and LINE) in high activity ChIP-STARR-seq enhancers (RPP ≥ 138). (B) Top-25 most enriched TE families in active enhancers. (C) Enrichment ratio versus activity level for distinct TE families. (D) Like in (C), but for the top-10 most enriched families of TEs in (B). (E) Comparison of the enrichment ratios in primed and naive ESCs. Each repeat element is shown by a dot with the size proportional to the number of overlaps with ChIP-STARR-seq regions. Elements with O/E ≥ 3 in naive or primed or with strong differences between both (O/E ≥ 2 and Δlog2(O/E) ≥ 2) are labeled. (F) Relative enrichment of selected TEs (from E) in primed (blue) and naive (red) ESCs as a function of enhancer activity level (RPP). (G) Kernel density plots of coverage (ChIP-seq/input) in ESCs for the indicated factor for all TEs overrepresented (O/E > 2) in active enhancers.
Figure 7
Figure 7
ChIP-STARR-Seq Dissects Super-Enhancers into Functional Elements (A) SEs were called from H3K27ac ChIP-seq data using ROSE (Whyte et al., 2013). (B) Scatterplot of SE intensity (H3K27ac enrichment over input) with ChIP-STARR-seq activity. r, Pearson correlation; blue line indicates a generalized additive model fit. (C) SE overlapping FGFR1, with ChIP-seq tracks for the indicated factors in primed/naive ESCs. Top plot: SE locus; bottom plot: zooms into second intron. Shown are the positions of regions assessed by ChIP-STARR-seq (gray) and active enhancers (blue) from this study and coordinates of luciferase constructs matching selected enhancers (labeled A–H). Enhancer activities are concentrated at small regions. (D) Luciferase assays of DNA sequences depicted in green in (C); n = 2. Error bars represent SD. (E) Violin plots of the proportion of active plasmids (RPP ≥ 138) for 1,369 SEs compared to normal enhancers (NE). (F) Sketch of the active subspace (covered by plasmids with RPP ≥ 138) of the entire SE space (all plasmids within SEs). (G) Table of the percentage of ChIP-STARR-seq plasmids representing regions within SEs and NEs active in primed and naive ESCs (RPP ≥ 138). Groups of enhancers that were called SEs in both, in on, or in neither state are distinguished.

References

    1. Afgan E., Baker D., van den Beek M., Blankenberg D., Bouvier D., Čech M., Chilton J., Clements D., Coraor N., Eberhard C. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016;44(W1):W3–W10. - PMC - PubMed
    1. Arnold C.D., Gerlach D., Stelzer C., Boryń L.M., Rath M., Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013;339:1074–1077. - PubMed
    1. Bailey T.L., Machanick P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 2012;40:e128. - PMC - PubMed
    1. Banerji J., Rusconi S., Schaffner W. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell. 1981;27:299–308. - PubMed
    1. Barakat T.S., Ghazvini M., de Hoon B., Li T., Eussen B., Douben H., van der Linden R., van der Stap N., Boter M., Laven J.S. Stable X chromosome reactivation in female human induced pluripotent stem cells. Stem Cell Reports. 2015;4:199–208. - PMC - PubMed