Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 18;107(20):9158-63.
doi: 10.1073/pnas.1004290107. Epub 2010 May 3.

Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence

Affiliations

Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence

Justin B Kinney et al. Proc Natl Acad Sci U S A. .

Abstract

Cells use protein-DNA and protein-protein interactions to regulate transcription. A biophysical understanding of this process has, however, been limited by the lack of methods for quantitatively characterizing the interactions that occur at specific promoters and enhancers in living cells. Here we show how such biophysical information can be revealed by a simple experiment in which a library of partially mutated regulatory sequences are partitioned according to their in vivo transcriptional activities and then sequenced en masse. Computational analysis of the sequence data produced by this experiment can provide precise quantitative information about how the regulatory proteins at a specific arrangement of binding sites work together to regulate transcription. This ability to reliably extract precise information about regulatory biophysics in the face of experimental noise is made possible by a recently identified relationship between likelihood and mutual information. Applying our experimental and computational techniques to the Escherichia coli lac promoter, we demonstrate the ability to identify regulatory protein binding sites de novo, determine the sequence-dependent binding energy of the proteins that bind these sites, and, importantly, measure the in vivo interaction energy between RNA polymerase and a DNA-bound transcription factor. Our approach provides a generally applicable method for characterizing the biophysical basis of transcriptional regulation by a specified regulatory sequence. The principles of our method can also be applied to a wide range of other problems in molecular biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Overview of the experiments. A) We used lac promoters mutagenized in region [-75∶-1] to drive the expression of GFP. B) Plasmids containing mutant lac promoters driving GFP expression were transformed into E. coli. Induced cells were then partitioned using FACS. Deep sequencing of the mutant promoters in each FACS batch yielded a long list of sequences σ with corresponding measurements μ. C) Plasmid pUA66-lacZ (21), a very-low-copy-number plasmid on which the wild-type lac promoter drives the expression of GFP; tick mark spacing is 200 bp. D) Fluorescence distributions of MG1655 cells containing the full-wt plasmid library (orange), the pUA66-lacZ plasmid (black), or a negative control plasmid pJK10 (SI Appendix: Fig. S1) in which region [-75∶-1] of the lac promoter was deleted (gray). In the full-wt experiment, batches B1–B9 received cells from the indicated fluorescence ranges, while batch B0 received cells randomly sampled from the initial library. E) Each PCR amplicon contained a 7 bp DNA barcode indicating the batch μ in which the sequence σ was found. 454 pyrosequencing (18) yielded reads of about 242 bp covering the indicated regions.
Fig. 2.
Fig. 2.
Information footprints. A) Footprint from full-wt data, aligned with known protein-DNA contact positions (highlighted). The lower plot is a 20X magnification of the upper plot. Error bars (dark blue lines) indicate uncertainties due to finite sample effects (SI Appendix: Computing mutual information). B) Footprint from the full-0 experiment, in which intracellular CRP was inactive. SI Appendix: Fig. S3 shows information footprints from all six experiments.
Fig. 3.
Fig. 3.
Models fit to full-wt data. A) The CRP energy matrix fit to [-75∶-49] by maximizing I(εcμ) on full-wt data. B) The RNAP energy matrix fit to [-41∶-1] by maximizing I(εrμ) on full-wt data. In A and B, each matrix column lists the energy contributions of the four possible bases at the aligned position within the site. Matrix elements range from 0 to 1 (in arbitrary units) with the lowest element in each column set to zero by convention. SI Appendix: Fig. S4 shows the CRP and RNAP matrices derived from all six of our datasets. C) The thermodynamic model for τ inferred using I(τμ) in Eq. 1. Optimal CRP and RNAP energy matrices are shown with elements expressed in kcal/mol (1 kcal/mol = 1.62kbT at T = 310 °K). It is useful to define each wild-type lac promoter site as having zero energy. We therefore add an energy shift, shown below each matrix, when computing εc and εr. Doing this means that Cc represents the intracellular CRP concentration in units of the dissociation constant of the wild-type (zero energy) site. Values quoted for εi and Cc are mean ± rmsd values determined from the parameter ensembles sampled using parallel tempering Monte Carlo.
Fig. 4.
Fig. 4.
Parameters fit to all six datasets. A) CRP-RNAP interaction energies εi (mean ± rmsd) inferred by fitting τ to all six datasets, using either data-set-specific values for εi (magenta) or a single εi for all six datasets (green). B) CRP concentrations Cc inferred for these same multidataset models. SI Appendix: Fig. S5 shows full ensemble distributions for εi and the six Cc parameters of the final model, together with mean and rmsd values for all the CRP and RNAP matrix elements.

References

    1. Ren B, et al. Genome-wide location and function of DNA binding proteins. Science. 2000;290:2306–2309. - PubMed
    1. Johnson D, Mortazavi A, Myers R, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316:1497–1502. - PubMed
    1. Berg O, von Hippel P. Selection of DNA binding sites by regulatory proteins. II. The binding specificity of cyclic AMP receptor protein to recognition sites. J Mol Biol. 1988;200:709–723. - PubMed
    1. Tuerk C, Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science. 1990;249:505–510. - PubMed
    1. Meng X, Brodsky MH, Wolfe SA. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat Biotechnol. 2005;23:988–994. - PMC - PubMed

Publication types

Substances