Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 19;58(11):1539-1551.
doi: 10.1021/acs.biochem.7b01069. Epub 2018 Dec 21.

Systematic Dissection of Sequence Elements Controlling σ70 Promoters Using a Genomically Encoded Multiplexed Reporter Assay in Escherichia coli

Affiliations

Systematic Dissection of Sequence Elements Controlling σ70 Promoters Using a Genomically Encoded Multiplexed Reporter Assay in Escherichia coli

Guillaume Urtecho et al. Biochemistry. .

Abstract

Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In Escherichia coli, decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our data set using a simple log-linear statistical model. Simple neural network models explain >95% of the variance in our data set by capturing nonlinear interactions with the spacer, background, and UP elements.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Recombination-mediated cassette exchange (RMCE) allows for high-efficiency genomic integration.
A) We developed a cre-lox based RMCE that utilizes a combination of asymmetric (lox66 and lox71) and incompatible (lox and loxm2) loxP sites to allow for RMCE. We tracked the cell population with flow cytometry during RMCE. Left: Population of cells containing the mCherry landing pad engineered in the nth-ydgR locus prior to RMCE. Center: After transformation and RMCE of constitutively expressed sfGFP library, but prior to selection, both exchanged and unexchanged populations co-exist showing that an estimated two-thirds of the cells undergo RMCE. Right: Post-selection population shows 94.3% of the resultant population contains the cassette (as measured by constitutive sfGFP expression) and loss of the original landing pad mCherry expression. B) Expression of mCherry landing pads at six previously characterized locations spanning the E. coli genome. Arrows indicate the landing pad orientation. C) Comparison of mCherry expression from the landing pad in both orientations at the nth-ydgr locus.
Figure 2.
Figure 2.. High-throughput quantification of σ70 promoter strength.
A) We designed and constructed a σ70 promoter library using an oligonucleotide microarray, and cloned the library into a custom-made reporter construct. The reporter contains a promoter to be tested, a RiboJ self-cleaving ribozyme sequence to standardize the reporter 5’ UTR, and an sfGFP coding sequence followed by a 20 nt barcode in the 3’ UTR that identifies the promoter variant. The exchange cassette also includes a constitutive kanamycin resistance marker downstream of the reporter for selection purposes. B) Pooled promoters are uniquely barcoded using PCR, cloned into the exchange vector, and integrated into the E. coli nth-ydgR locus as a library. C) Pre-integration barcodes are identified during mapping stage and integrated barcodes are identified when quantifying promoter strength using RNA-Seq and DNAseq. We found 90.5% of the barcodes that were observed in the mapping stage (blue histogram), were later observed in the integrated library (red histogram), and the overall distributions remained similar. D) Expression of each promoter is calculated as the sum of all RNA counts divided by the sum of all DNA counts for all barcodes mapped to a given promoter. E) Promoter strength measurements are highly correlated (R2=0.952, p < 2.2×10−16) between technical replicates and discriminate between negative controls and promoters with consensus core elements.
Figure 3.
Figure 3.. Expression levels for thousands of promoters
A) We plot the expression of all the promoters containing consensus −10 and −35 elements we measured in the library (red to blue is an estimated 100 fold decrease in measured expression). Each block of 48 squares displays six different backgrounds vertically against eight different spacer sequences horizontally. The three blocks represent the UP element choices used. We did not display two backgrounds for space and because they contained the most missing data but have included them in the supplement (Figure S3). The expression levels vary up to 29.9-fold based based on different background, spacer, and UP element choices. B) We plot expression of 3,072 promoters with the 136x UP element in blocks of 48 measurements (as in 3A), but now with all −10 (horizontal) and −35 (vertical) choices we measured in our assay. Expression generally increases as the −10 and −35 elements approach the consensus, yet like the consensus, there is variance amongst promoters with the same −10 and −35 elements. Promoter variants for which we could not detect more than four unique barcodes were omitted from our analysis and are displayed as grey squares.
Figure 4.
Figure 4.. Predictive modeling of σ70 promoter strength.
A) We trained a log-linear model on 50% of the data, and the resultant predictions on the remaining data explain approximately 80% of the variance in expression within our dataset. B) We analyzed the model by ANOVA and found that approximately 73.7% of variance in promoter expression can be explained by the −10 and −35 elements (and their interaction). C) We also trained a simple neural network model and found that the resultant predictions captured an estimated 95.5% of the promoter variance, indicating that these models are better able to capture more complex interactions between sequence elements. D) We trained the same neural network models with 10-fold cross-validation and show that we can effectively predict promoter expression when trained on as little as 5% of the data. In 4A, 4C, and 4D, R2 is the coefficient of determination between predicted and actual expression values on the held-out datasets.
Figure 5.
Figure 5.. Identification of nonlinear interactions among promoter elements with direct RNAP Interactions.
A) We plot all promoters split by −10 element and colored by −35 element. The overall promoter expression increases approaching the consensus −10 and −35, yet the strongest expressing promoters with a consensus −10 tend not to be those with a consensus −35. B) The median expression of all promoters as a function of the −10 and −35 identity shows a similar general trend towards increased expression as −10 and −35 gets closer to consensus. However, median expression of promoters containing a combination of a consensus and mutant −10 and −35 elements is higher than promoters containing both consensus sequences. C) We plot the fold-change increase in expression due to the addition of the 326x UP element as a function of the expression of the promoter without the UP element. Weaker promoters have the greatest increase in expression upon addition of the consensus UP element. D) We show the median log2 fold-change in expression for all −10 and −35 element combinations upon addition of the 326x UP element. On average, expression of promoters containing consensus −10 and −35 elements drops by 15%.
Figure 6.
Figure 6.. Effects of background and spacers on expression.
A) The distribution of expression levels of promoters with different promoter backgrounds (boxplots) is similar yet consensus promoters (red points) vary drastically across these same contexts. Backgrounds are arranged from left to right by increasing GC content. B) The spacer GC content is negatively correlated with promoter expression. Each point represents the median expression amongst active promoters (RNA/DNA > 0.5) containing the indicated spacer. (r = −0.74, p =.036).

References

    1. Berthoumieux S, de Jong H, Baptist G, Pinel C, Ranquet C, Ropers D, and Geiselmann J (2013) Shared control of gene expression in bacteria by transcription factors and global physiology of the cell. Mol. Syst. Biol 9, 634. - PMC - PubMed
    1. Browning DF, and Busby SJW (2016) Local and global regulation of transcription initiation in bacteria. Nat. Rev. Microbiol 14, 638–650. - PubMed
    1. Gerosa L, Kochanowski K, Heinemann M, and Sauer U (2013) Dissecting specific and global transcriptional regulation of bacterial gene expression. Mol. Syst. Biol 9, 658. - PMC - PubMed
    1. Feklístov A, Sharon BD, Darst SA, and Gross CA (2014) Bacterial Sigma Factors: A Historical, Structural, and Genomic Perspective. Annu. Rev. Microbiol 68, 357–376. - PubMed
    1. Gruber TM, and Gross CA (2003) Multiple sigma subunits and the partitioning of bacterial transcription space. Annu. Rev. Microbiol 57, 441–466. - PubMed

Publication types

MeSH terms