Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb;35(2):145-153.
doi: 10.1038/nbt.3754. Epub 2016 Dec 26.

Genome-wide mapping of autonomous promoter activity in human cells

Affiliations

Genome-wide mapping of autonomous promoter activity in human cells

Joris van Arensbergen et al. Nat Biotechnol. 2017 Feb.

Abstract

Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of the sequences that could be tested. Here we present 'survey of regulatory elements' (SuRE), a method that assays more than 108 DNA fragments, each 0.2-2 kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library of random genomic fragments upstream of a 20-bp barcode is constructed, and decoded by paired-end sequencing. This library is used to transfect cells, and barcodes in transcribed RNA are quantified by high-throughput sequencing. When applied to the human genome, we achieve 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide in K562 cells. By computational modeling we delineate subregions within promoters that are relevant for their activity. We show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests Statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1. SuRE provides a genome-wide map of autonomous promoter activity
a. Schematic representation of the SuRE experimental strategy. ORF, open reading frame; PAS, polyadenylation signal. Colors indicate different barcodes. b. Representative ~1Mb genomic region showing histone modifications H3K27ac and H3K4me3 that mostly mark active TSSs, and SuRE signals divided into plus and minus orientation. SuRE signal represents fold enrichment over input. c. Relative enrichment (compared to random) of SuRE peaks among the major types of chromatin. d. Correlation between endogenous promoter activity (measured by GRO-cap) and SuRE enrichment at TSSs. The density plots show the data distribution over each axis. nd, not detected. e. Correlation between relative promoter autonomy (log10(SuRE enrichment/GRO-cap)) and tissue specificity (number of cell types and tissues in which each TSS is active, out of 889 tested). Grey line shows linear fit. f. Correlation between relative promoter autonomy and the total number of enhancers that are found in a fixed window of 5–50 kb from the TSS (regardless of the position of neighboring genes). The y-axis scale is the same as in e.
Figure 2
Figure 2. Autonomous divergent promoter activity
a. Mean SuRE enrichment at all TSSs and their 5kb flanking regions. b, c. SuRE enrichment aligned to all TSSs in the sense (b) and antisense (c) orientation, sorted by sense signal intensity. d. Distribution of SuRE enrichment levels at all TSSs; nd, not detected.
Figure 3
Figure 3. Partially overlapping query fragments allow for delineation of regions that drive promoter activity
a. Top tracks: GRO-cap expression, SuRE enrichment and alternative transcripts; bottom panel: SuRE expression of individual genomic fragments around the NUP214 TSS in the sense orientation. The y-axis indicates the log10-transformed number of reads for each genomic fragment; a random value between −0.2 and +0.2 was added to avoid overlap of fragments. The 5′ end of each element is indicated by a black vertical bar. b. Contribution to autonomous promoter activity across the region surrounding the NUP214 TSS, estimated using an elastic net Poisson regression model that uses fragment overlap with 50bp genomic sequence bins to predict expression in a multiplicative manner. The model fit was repeated using shifted versions of the same bins to avoid artefacts due to breakpoint choice. Shown are the exponentiated per base mean coefficients for all possible shifts. c. Mean SuRE expression of genomic fragments with a similar start and end position (binned in 100 bp windows) relative to the nearest TSS. For example, the leftmost colored arrows mark all fragments starting at −500 ± 50 bp and the rightmost colored arrows mark all fragments ending at the TSS ± 50 bp; the square at the intersection shows the mean SuRE expression of all fragments that match both criteria. NA: fewer than 50 fragments in bin. d. Same as (b) but for all TSSs. e. Same as (a) but for antisense orientation. Here the 3′ end of each element is indicated by black vertical bar. f. Same as (b) but for antisense orientation. g. Same model used in (d) was applied to a subset of sense-antisense TSS pairs, using 50bp regions centered on the sense TSS (right) in one model and the antisense TSS (left) in a second. Expected fold-changes in sense (above) and antisense expression (below) are shown for the 50 bp region centered on the corresponding TSS. Error bars indicate standard error of Poisson regression coefficients. hj. GRO-cap expression and alternative transcripts (top panels) and contribution to autonomous promoter activity as in (b) (bottom panels) for the genes SLC50A1 (h), WDR47 (i) and HIST1H2BD (j). In all panels, sense orientation is depicted in blue and antisense orientation in red.
Figure 4
Figure 4. Relationship between CpG islands and gene expression
a. Distribution of all mappable SuRE fragments, regardless of their expression level, in terms of their CpG characteristics. Only fragments that overlap an annotated TSS were included. The color scale indicates the number of fragments belonging to each hexagon bin. The lines denote when the observed CpG density per base pair equals 100% (solid) or 50% (dashed) of the value expected based on C+G content. b. Relationship between expression level and CpG characteristics. The color scale indicates the average cDNA read count per fragment in each hexagon bin. Lines are the same as in a.
Figure 5
Figure 5. Autonomous transcription from enhancers
a. SuRE data indicate that three of the five DNase hypersensitive sites (DHS) in the β-globin locus control region show autonomous transcription activity. b. SuRE signals (plus and minus strand combined) aligned to enhancers (‘Enh’), weak enhancers (‘EnhW’) and quiescent parts of the genome (‘Quies’), each sorted by SuRE signal intensity. c. Average profiles of data in b. d. Distribution of SuRE enrichments as shown in b compared to TSSs. nd, not detected. e. Correlation between SuRE expression and H3K27ac signal for enhancers. Grey line shows linear fit. f. Correlation between enhancer strength of ~130 bp fragments from selected enhancers and the mean SuRE expression in a 1 kb window around the center of these (n=189). Grey line shows linear fit. g. Expression levels of 4 genes of the alpha-globin region and a negative control gene (ACTB) after 24 hours of induction with hemin or the solvent control. Expression levels were normalized to TBP and visualized as fold-change relative to solvent control. Error bars indicate the SEM of 3 biological replicates. h. Genomic region of the alpha-globin locus. The top track indicates conserved enhancers. The track below shows the DHS-seq signal. The bottom 4 tracks show SuRE enrichment before and after hemin induction for the plus strand (blue) and minus strand (red).
Figure 6
Figure 6. Autonomous transcription from specific repeat elements
a. Enrichment of SuRE peaks among the major repeat families. Asterisks: significant enrichment or depletion (p < 0.01 after multiple testing correction). b. Mean SuRE enrichment of subfamilies LTR12C (left panel; n = 2,600) and MER41B (right panel; n = 2,764) in the sense (blue) and antisense (red) direction. c. Distribution of SuRE enrichment levels (plus and minus strand combined) of LTR12C and MER41B repeats compared to enhancers and TSSs. nd, not detected. d. Contribution of LTR12C sequences to autonomous promoter activity, as in Fig. 3b, relative to previously annotated, U3, promoter (P), enhancer (E), transcribed (R) and U5 elements. e. Average endogenous run-on transcription levels in the sense orientation at indicated distances upstream or downstream of LTR12C repeats. High and low activity refers to top 50% and bottom 50% in SuRE enrichment.

Comment in

  • Core promoters across the genome.
    Cvetesic N, Lenhard B. Cvetesic N, et al. Nat Biotechnol. 2017 Feb 8;35(2):123-124. doi: 10.1038/nbt.3788. Nat Biotechnol. 2017. PMID: 28178253 No abstract available.

References

    1. Kadonaga JT. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip Rev Dev Biol. 2012;1:40–51. - PMC - PubMed
    1. Shiraki T, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A. 2003;100:15776–15781. - PMC - PubMed
    1. Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008;322:1845–1848. - PMC - PubMed
    1. Kwak H, Fuda NJ, Core LJ, Lis JT. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013;339:950–953. - PMC - PubMed
    1. Core LJ, et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet. 2014;46:1311–1320. - PMC - PubMed

Publication types