Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb;29(2):171-183.
doi: 10.1101/gr.236075.118. Epub 2019 Jan 8.

Systematic interrogation of human promoters

Affiliations

Systematic interrogation of human promoters

Shira Weingarten-Gabbay et al. Genome Res. 2019 Feb.

Abstract

Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome. We used this method to investigate thousands of native promoters and preinitiation complex (PIC) binding regions followed by in-depth characterization of the sequence motifs underlying promoter activity, including core promoter elements and TF binding sites. We find that core promoters drive transcription mostly unidirectionally and that sequences originating from promoters exhibit stronger activity than those originating from enhancers. By testing multiple synthetic configurations of core promoter elements, we dissect the motifs that positively and negatively regulate transcription as well as the effect of their combinations and distances, including a 10-bp periodicity in the optimal distance between the TATA and the initiator. By comprehensively screening 133 TF binding sites, we find that in contrast to core promoters, TF binding sites maintain similar activity levels in both orientations, supporting a model by which divergent transcription is driven by two distinct unidirectional core promoters sharing bidirectional TF binding sites. Finally, we find a striking agreement between the effect of binding site multiplicity of individual TFs in our assay and their tendency to appear in homotypic clusters throughout the genome. Overall, our study systematically assays the elements that drive expression in core and proximal promoter regions and sheds light on organization principles of regulatory regions in the human genome.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Construction and measurements of 15,753 designed oligonucleotides for promoter activity using site-specific integration technology. (A) Illustration of the design of the main sets composing the synthetic library. (B) We synthesized 15,753 designed ssDNA oligos 200 nt in length on Agilent programmable arrays and harvested them as a single pool. Oligos were amplified by PCR using constant primers and cloned into pZDonor plasmid upstream of eGFP. The plasmid pool was conucleofected with mRNAs encoding zinc finger nucleases (ZFNs) targeting the AAVS1 site into a modified K562 cell line containing only two (of three) copies of the AAVS1 site (see Methods). mCherry expression driven from a constitutive EF1alpha promoter was used to select cells with a single integration by FACS. Cells were then sorted into 16 bins according to eGFP/mCherry ratio. Oligos were amplified from each bin and submitted for deep sequencing. Finally, the distribution among expression bins was determined for each oligo, and mean expression and noise were computed. (CV) Coefficient of variation. (C,D) Accuracy of expression measurements. Twenty-one clones, each expressing a single oligo, were isolated from the library pool and identified by Sanger sequencing. eGFP/mCherry ratio was measured for each clone individually by flow cytometry. Shown are comparisons between these isolated measurements and those calculated from the pooled expression measurements for mean expression (C; R = 0.98, Pearson's correlation, P < 10−15) and noise (D; R = 0.94, Pearson's correlation, P < 10−10). (E) Detection of autonomous core promoter activity. Sequences of four full-length promoters were partitioned in-silico into 153-nt fragments with a large overlap of 103 nt between oligos. The positions of the annotated transcription start sites (TSSs) from the literature are denoted, and the positions on the x-axis are relative to the TSSs. Dashed lines represent the activity threshold determined by the empty vector measurements (Methods).
Figure 2.
Figure 2.
Functional measurements of autonomous core promoter activity of PIC binding sequences from promoters and enhancers. (A) Illustration of the designed sequences matching 508 PIC binding regions in promoters and enhancers that were identified by ChIP-exo measurements in K562 cells (Venters and Pugh 2013). (B) Comparison between core promoter activity of sequences with different PIC binding levels (TFIIB ChIP-exo). Data were binned into four groups according to the number of ChIP-exo reads, and expression measurements were compared between bins (P < 10−15, Kruskal-Wallis test). (C) Comparison between the fraction of positive core promoters for PIC binding sequences from promoters and enhancers (left; P < 10−5, two-proportion z-test) and the activity levels of positive sequences from both groups (middle; P < 0.03, Wilcoxon rank-sum test). To avoid biases in activity stemming from different PIC binding levels, sequences with the same number of ChIP-exo reads were selected in the design process (right; P > 0.1, Wilcoxon rank-sum test). (D,E) Comparison between core promoter activity of PIC binding sequences from promoters (D) and enhancers (E) in two orientations. Each dot represents a distinct PIC binding site that presented positive activity in at least one orientation. Expression measurements of designed sequences are shown for the stronger and weaker orientations of each pair of sequences. The horizontal dashed line represents the activity threshold as determined by empty vector measurements; the diagonal dashed line, a theoretical x = y line expected for promoters with equal expression in the two orientations.
Figure 3.
Figure 3.
Systematic investigation of core promoter elements in synthetic configurations and native core promoters from the human genome. (A) The relationship between GC content and promoter activity in 1875 native core promoters from the human genome. (Cyan) Sequences with no promoter activity as defined by empty vector measurements; (orange) sequences with positive promoter activity. (B) Three hundred twenty synthetic oligos representing all possible combination of six core promoter elements on five different backgrounds were designed. Each line in the heatmap (left) represents a single designed oligo, and each column represents one of the six elements tested. The configurations were sorted according to the expression measurements (right). (C) Comparison between the expression of all the designed sequences with and without each of the six core promoter elements. Each measurement was normalized by the expression levels of the matching background sequence. Wilcoxon rank-sum tests were performed to determine significant differences in expression, and P-values are denoted. (D) The effect of TATA-box in native human core promoters. (Top) Expression measurements from our functional assay of native core promoters from the human genome with and without a consensus TATA-box. Elevated expression is observed in promoters with TATA element (P < 10−4, two-sample t-test). (Bottom) CAGE-seq measurements in K562 cells for the same promoters from ENCODE (The ENCODE Project Consortium 2012). No significant difference was detected between the two groups (P > 0.5, two-sample t-test). (E) Noise measurements of 990 native core promoters from the human genome as a function of mean expression. A linear fit was performed on oligos with positive core promoter activity as described before (Bar-Even et al. 2006). (F) Comparison of noise measurements of native core promoters with and without a TATA-box.
Figure 4.
Figure 4.
The effect on expression of TATA-box and Initiator (Inr) combinations and relative distances in different backgrounds. (A) Comparison of expression levels of synthetic oligos with TATA to those containing both TATA and Inr. Each dot represents a pair of sequences with either TATA or TATA+Inr elements. An increase in expression is observed when adding Inr (P < 10−3, Wilcoxon signed-rank test). (B) Testing for synergy between TATA and Inr elements. Each dot represents a pair of expression values. On the x-axis, expression was computed as the sum of the expression of separate oligos with either TATA or Inr. The y-axis represents expression measurements of oligos that contain the two elements. (CE) Comparison of oligos with either TATA, Inr, or TATA+Inr in three different promoter backgrounds. Presented P-values were computed by Wilcoxon rank-sum test (n = 16 in each group). (FH) Testing the effect of the distance between the TATA and the Inr in three different backgrounds. We designed oligos in which we placed the Inr in its consensus position and systematically changed the location of the TATA (2- to 3-nt increments). Each blue dot represents the expression level at a single position. The consensus position of the TATA (−31) is denoted.
Figure 5.
Figure 5.
TF activity screen for 133 binding sites and the effect of nucleosome disfavoring sequence on expression. (A) Illustration of designed oligos for TF activity screen. One hundred thirty-three binding sites for 70 TFs were placed in four copies in either the forward or the reverse orientation in two backgrounds. (B) Expression measurements of oligos containing forward TF binding sites in two different backgrounds. Each bar represents a single binding site. Activity threshold determined by the empty vector is denoted. (C) TF activity measurements of expressed and unexpressed TFs as determined by ENCODE RNA-seq in K562 cells. Low activity is obtained for unexpressed TFs (P < 10−12, Wilcoxon rank-sum test). (D) Comparison between expression measurements of binding sites in two orientations. Each dot represents a pair of sequences for the same binding site placed in the forward or the reverse orientation (R = 0.81, P < 10−20, Pearson's correlation). (E) Comparison between expression measurements of binding sites in different backgrounds. Each dot represents a pair of sequences for the same binding site placed in the ACTB or the CMV backgrounds (R = 0.72, P < 10−20, Pearson's correlation). (F) Testing the effect on expression of adding two TF binding sites. Each dot represents a pair of designed promoters with either two or four sites for one of the 70 TFs tested in the CMV background. An increase in expression is observed for most TFs (P < 10−3, Wilcoxon signed-rank test). (G) Testing the effect on expression of nucleosome disfavoring sequence. A 25-mer poly(dA:dT) tract was added upstream to two binding sites for 70 TFs. An increase in expression is observed for most TFs (P < 10−6, Wilcoxon signed-rank test). (H) Systematic scanning mutagenesis to identify cis-regulatory elements in the CMV promoter. Eleven mutated oligos were designed; each contains a 14-nt window in which all nucleotides were mutated. Each dot represents expression of one mutated oligo. No elevation in expression is observed when mutating the sequences in which the poly(dA:dT) was inserted.
Figure 6.
Figure 6.
Systematic interrogation of the effect of homotypic TF binding site numbers on expression. (A) Illustration of different expression functions when adding homotypic binding sites for different TFs. (B) The design of 1024 synthetic oligos to systematically investigate the effect of site numbers on expression. Four different TF binding sites were planted in all possible combinations of one to seven sites in seven predefined positions within two different background sequences. (C) Shown is the number of homotypic clusters for TF binding sites (HCTs) of different TFs in the human genomes. Data were taken from Gotea et al. (2010). Each gray bar represents a single TF. Denoted are the four TFs chosen for the design of the synthetic oligos representing different numbers of HCTs. (DG) Expression measurements of oligos with increasing number of sites for SP1 (D), ETS1 (E), YY1 (F), and CREB (G) in the ACTB background. Each dot represents a single oligo in the library. A logistic function was fitted (Methods), and the correlations between the expression measurements and the fitted values are shown for each TF. Missing data points in panel D are oligos with NaN value (less than 100 reads; see Methods). (H) A summary plot of the four expression curves that were computed in D through G for direct comparison between TFs.

References

    1. Adachi A, Gendelman HE, Koenig S, Folks T, Willey R, Rabson A, Martin MA. 1986. Production of acquired immunodeficiency syndrome-associated retrovirus in human and nonhuman cells transfected with an infectious molecular clone. J Virol 59: 284–291. - PMC - PubMed
    1. Adra CN, Boer PH, McBurney MW. 1987. Cloning and expression of the mouse pgk-1 gene and the nucleotide sequence of its promoter. Gene 60: 65–74. 10.1016/0378-1119(87)90214-9 - DOI - PubMed
    1. Albagli-Curiel O, Lécluse Y, Pognonec P, Boulukos KE, Martin P. 2007. A new generation of pPRIG-based retroviral vectors. BMC Biotechnol 7: 85 10.1186/1472-6750-7-85 - DOI - PMC - PubMed
    1. Andersson R, Chen Y, Core L, Lis JT, Sandelin A, Jensen TH. 2015. Human gene promoters are intrinsically bidirectional. Mol Cell 60: 346–347. 10.1016/j.molcel.2015.10.015 - DOI - PMC - PubMed
    1. Arnold CD, Zabidi MA, Pagani M, Rath M, Schernhuber K, Kazmar T, Stark A. 2017. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat Biotechnol 35: 136–144. 10.1038/nbt.3739 - DOI - PMC - PubMed

Publication types