Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec;636(8043):745-754.
doi: 10.1038/s41586-024-08219-w. Epub 2024 Nov 20.

Single-molecule states link transcription factor binding to gene expression

Affiliations

Single-molecule states link transcription factor binding to gene expression

Benjamin R Doughty et al. Nature. 2024 Dec.

Abstract

The binding of multiple transcription factors (TFs) to genomic enhancers drives gene expression in mammalian cells1. However, the molecular details that link enhancer sequence to TF binding, promoter state and transcription levels remain unclear. Here we applied single-molecule footprinting2,3 to measure the simultaneous occupancy of TFs, nucleosomes and other regulatory proteins on engineered enhancer-promoter constructs with variable numbers of TF binding sites for both a synthetic TF and an endogenous TF involved in the type I interferon response. Although TF binding events on nucleosome-free DNA are independent, activation domains recruit cofactors that destabilize nucleosomes, driving observed TF binding cooperativity. Average TF occupancy linearly determines promoter activity, and we decompose TF strength into separable binding and activation terms. Finally, we develop thermodynamic and kinetic models that quantitatively predict both the enhancer binding microstates and gene expression dynamics. This work provides a template for the quantitative dissection of distinct contributors to gene expression, including TF activation domains, concentration, binding affinity, binding site configuration and recruitment of chromatin regulators.

PubMed Disclaimer

Conflict of interest statement

Competing interests: W.J.G. is a consultant and equity holder for 10x Genomics, Guardant Health, Quantapore and Ultima Genomics, co-founder of Protillion Biosciences and is named on patents describing ATAC–seq. L.B. is a co-founder of Stylus Medicine and a member of its scientific advisory board. All other authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Single-molecule footprinting reveals TF occupancy and promoter state at an engineered, genome-integrated expression reporter system
A) Variable numbers (0–8) of TetO binding sites (blue) upstream of a minCMV (gray) promoter and Citrine (green) gene were engineered into the AAVS1 locus in K562 cells expressing rTetR-VP48. Binding of rTetR-VP48 is inducible upon addition of doxycycline (dox) to cell media. Black arrows are the primers used to sequence the reporter. B) Mean fluorescent intensity (MFI) of the Citrine reporter measured using flow cytometry after 24 hours of 1,000 ng/ml doxycycline exposure (black) as a function of the number of transcription factor TetO binding sites (TFBS). C) Schematic describing the single-molecule footprinting (SMF) assay for determining the molecular state of regulatory elements or promoters. Accessible GpCs are methylated (white circles) while inaccessible GpCs remain unmethylated (black circles). D) Example SMF molecules with 5 TetO sites, along with the molecular configuration interpretations for these molecules. The high state represents a methylated, accessible GpC, while a low state represents an unmethylated, inaccessible GpC. E) Aggregated data obtained for the construct with 5 TetO sites present, with (black) and without (gray) dox present. Error bars represent standard error of the mean from 4 biological replicates. F) Summary plots of single molecules (individual rows) observed by SMF for enhancers (left) and promoters (right) of constructs with 5 TetO sites without (top) and with (bottom) dox induction. Each GpC spans an equal width in this representation; methylated (accessible) Cs in white and unmethylated (protected) Cs in black. The bars at the right of each summary plot show the fraction of enhancer sites with one or more TF bound (open + TF bound; blue) or fully nucleosomal (gray) and the fraction of promoters that are not bound by a nucleosome (active; green) or bound by a nucleosome (gray).
Figure 2:
Figure 2:. Thermodynamic models reveal rTetR-VP48 competes with nucleosomes with the aid of activation domains
A) Summary plots of enhancer methylation across single molecules observed by SMF for enhancers with 0–8 TetO binding sites after 24 hours of 1,000 ng/ml dox induction. Each row represents methylation data for one molecule, as in 1F. B) Observed average occupancy of rTetR-VP48 for enhancers with increasing numbers of TetO binding sites (four biological replicates) fit to a Simple Competition model (described in F, r2 = 0.84) or a Nucleosome Destabilization model (described in G, r2 = 0.99). C) Observed occupancy distribution of rTetR-VP48 TFs for the enhancer with 7 TetO sites (four biological replicates) fit to a Simple Competition model (described in F, r2 = −0.72) or a Nucleosome Destabilization model (described in G, r2 = 0.79). D) Observed distributions of occupied TetO sites on molecules of 6xTetO with 2 TFs bound with and without nucleosomes present and the predictions from a model that assumes independent binding. E) Likelihood of distinct TetO binding sites of being occupied in nucleosome-free molecules containing 6 TetO sites, with position 6 being closest to the promoter (two biological replicates). The dashed line indicates the average occupancy. F) Schematic of the Simple Competition model describing a direct competition of nucleosomes and TFs for binding to DNA. G) Schematic of the Nucleosome Destabilization model describing a TF-dependent recruitment of a cofactor that reduces the binding energy of a nucleosome when one or more TFs are present. H) Best fit parameters and error (SEM) for binding of rTetR-VP48 using Nucleosome Destabilization Model in (G). I) Full molecular state representations for 10,000 measured (left) or simulated (according to the Nucleosome Destabilization, right) 6xTetO enhancer molecules, where each column represents a TetO site and each row represents a molecule. Sites are colored by their occupancy status. J-K) Observed average occupancy (J) and occupancy distribution (K) of rTetR (only) (two biological replicates) with a fit from the Simple Competition model (r2 = 0.77 and 0.94, respectively). Gray line is rTetR-VP48 data for comparison. L-M) Observed average occupancy (L) and occupancy distribution (M) of rTetR-VP48 in the presence of the BAF inhibitor BRM014 (two biological replicates) with a fit from the Nucleosome Destabilization model (r2 = 0.94 and 0.96, respectively). Gray line is rTetR-VP48 data for comparison. N-O) Observed average occupancy (N) and occupancy distribution (O) of rTetR-VP48 in the presence of the p300 inhibitor A485 (two biological replicates) with a fit from the Nucleosome Destabilization model (r2 = 0.99 and 0.90, respectively). Gray line is rTetR-VP48 data without inhibitors for comparison. P) Parameter fits for models in L-O. Error bars represent the standard error of the mean from two biological replicates. Gray dashed line is rTetR-VP48 parameter fits for comparison.
Figure 3:
Figure 3:. Average rTetR-VP48 occupancy predicts nucleosome free promoters and gene expression
A) Summary plots of promoter methylation of single molecules for enhancers with 0–8 TetO binding sites after 24 hours of dox induction. B) Fraction active (nucleosome-free) promoters as a function of the number of TetO sites across three DNA background sequences. C) Average rTetR-VP48 occupancy as a function of TetO binding sites for three DNA backgrounds. Lines represent the predictions from the Nucleosome Destabilization model fit on each background separately. D) Relationship between rTetR-VP48 occupancy and the fraction of active promoters across three backgrounds. Black line is an additive activation model fit from E (kon/koff = 0.15 ± 0.002 TF−1). E) Relationship between fraction of active promoters and gene expression by flow cytometry for three DNA backgrounds. Black line is a thresholded linear fit. F) Schematic of an additive kinetic model whereby TFs contribute independently to promoter activation (kon) by modulating the rate of transitioning from the promoter off state to the on state (Additive Activation model). G) Relationship between TetO copy number and gene expression by flow cytometry for three DNA backgrounds. Lines are combined model fits (Nucleosome Destabilization model, Additive Activation model, and promoter to gene expression model) inputting TetO copy number. H) Observed fraction of active promoters for molecules with N (0–5) rTetR-VP48 TFs bound for different numbers of TetO sites. Error bars are bootstrapped standard deviations. I) Relationship between number of TetO binding sites and the average rTetR-VP48 occupancy and for different concentrations of dox (after 24 hours of exposure). J) Relationship between average rTetR-VP48 occupancy and the fraction of active promoters for different concentrations of dox (after 24 hours of exposure). Lines are Additive Activation model fits. All r2 between 0.84–0.97. K) Parameter values of effective rTetR-VP48 concentration (Nucleosome Destabilization model) and potency (Additive Activation model) across dox concentrations relative to maximum dox. L) Relationship between number of TetO binding sites and the average rTetR-VP48 occupancy under conditions of BAF inhibition (purple) or p300 inhibition (cyan). Dotted line is data without inhibitors. M) Relationship between average rTetR-VP48 occupancy and the fraction of active promoters under conditions of BAF inhibition (purple) or p300 inhibition (cyan). Lines are Additive Activation model fits. Dotted line is best fit to data without inhibitors. N) Quantification of average TF occupancy from Nucleosome Destabilization model, potency from Additive Activation model, and prediction of gene expression by coupling these models under conditions of BAF inhibition (purple) or p300 inhibition (cyan) relative to fits without inhibitors for molecules with 5–8 TetO sites in background 0. Black dots for gene expression are measured MFI from flow cytometry on individual cell lines. Significance is determined by paired T-test with 4 degrees of freedom (t-valueoccupancy=−15.8, t-valueMFI=−7.4). O) Schematic showing that parameters of TF occupancy (including #TFBS, TF concentration, and chromatin remodelers (CRs)) and TF potency (including activation domains (ADs) and cofactors (CoFs)) separably tune gene expression.
Figure 4:
Figure 4:. Single-molecule footprinting of an type I interferon response reporter reveals decoupling of accessibility and activation
A) Variable numbers (0–6) of interferon-stimulated response elements (ISRE, pink) upstream of a minCMV (gray) promoter and Citrine gene (green) were integrated at the AAVS1 locus in K562 cells. In the unstimulated state, we expect a IRF9 monomer or IRF9-STAT2 heterodimer to bind the ISRE. Upon IFN-β stimulation, we anticipate binding of the IRF9-phosphoSTAT1–2 trimer and expression activation. B) Gene expression (as measured by RT-qPCR of Citrine mRNA) and protein levels (as measured by flow cytometry of Citrine) as a function of the number of ISREs before and after IFN-β stimulation (6 hours of stimulation for RNA and 12 hours for protein) for two technical replicates. C) Exemplar SMF data displaying six narrow footprints in the absence of stimulation (top), and six wide footprints in the presence of IFN-β (bottom). D) Summary plots of single molecules observed by SMF for enhancers (left) and promoters (right) of constructs with six ISREs without (top) and with (bottom) 6 hours of IFN-β induction. Each GpC spans an equal width in this representation. The fraction of enhancer sites with majority narrow TF footprints (gray), wide TF footprints (pink) and fully nucleosomal (black) is shown, as is the fraction of promoters that are not bound by a nucleosome (active; green) vs bound by a nucleosome (black). E-F) Observed occupancies of nucleosomes (E) and wide and narrow TF footprints (F) as a function of the number of ISREs before and after six hours of stimulation for two biological replicates. G) Relationship between wide footprint occupancy and the fraction of active promoters for two biological replicates. Black line is an Additive Activation model fit (see 3G, kon/koff = 0.44 ± 0.02 TF−1). Gray dotted line represents the model fit for rTetR-VP48. H) Average ATAC-seq accessibility over two biological replicates before (pink) and after six hours (red) of stimulation relative to the TSS of genes upregulated with stimulation (as identified by bulk RNA-seq) that contain at least three ISREs within the 500 bp window. Gray line is average ATAC-seq accessibility for non-ISG promoters that are matched to the pre-stimulation expression level (TPM) of plotted ISG promoters. Shaded error regions are 95% confidence intervals from bootstrapping. I) Average ATAC-seq accessibility over two biological replicates relative to ISREs genome-wide, Tn5-bias corrected and RPM normalized. J) RNA expression of endogenous ISGs before (pink) and after (red) stimulation over two biological replicates of RNA-seq. K) Promoters and promoter-proximal enhancers of ISGs IFI6, ISG15 and USP18 containing ISREs (pink) as well as perturbations scrambling only the ISREs (gray) upstream of Citrine gene (green) were engineered into the AAVS1 locus in K562 cells. L) Average SMF accessibility of ISG promoter reporters with intact ISREs pre-stimulation (pink) and after six hours of stimulation (red) and scrambled ISREs pre- (light gray) and post-stimulation (dark gray) for two biological replicates. Two versions of scrambled ISREs were measured for each construct. Black dashed line is the average SMF accessibility of a sequence with no intentional binding sites. M) Gene expression (as measured by flow cytometry) of ISG promoter reporters with intact ISREs pre-stimulation (pink) and after 24 hours of stimulation (red) and scrambled ISREs pre- (light gray) and post-stimulation (dark gray) for two technical replicates. Two versions of scrambled ISREs were measured for each construct. Black dashed line is the MFI of WT K562s.
Figure 5:
Figure 5:. Kinetic modeling reveals timescales for chromatin and gene regulatory changes, as well as relative effects of cofactor inhibition and TF potency on activation
A) The distribution of molecules with different numbers of rTetR-VP48s bound for a construct with 7 TetO binding sites for different times (after dox induction) and two biological replicates. The 0 minute sample (gray) is processed without dox present in buffers during the SMF assay (see Methods) whereas the 0 minute + dox (light blue) sample and all later timepoints are processed with dox present in buffers. B) The relationship between average number of TetO sites bound by rTetR-VP48 and the fraction of promoters active for different times after dox induction and two biological replicates. Lines are Additive Activation model fits. C) Potency (kon/koff) fit from Additive Activation Model across dox timecourse. Error bars are standard deviations. D) A kinetic model for enhancer equilibration, promoter activation, transcription and translation. E) Fits (lines, using kinetic scheme in D) on data (points) for average occupancy of rTetR-VP48, fraction active promoters, RNA expression, and MFI for a construct with 7 TetO binding sites over time. TF occupancy parameters: t1/2 = 2.1 ± 0.3 hr and a = 0.52 ± 0.02. Promoter activation parameters: kon = 0.11 ± 0.02 TF−1 hr−1 and koff = 0.16 ± 0.04 hr−1. RNA parameters: ktrs = 2.0 ± 0.4 RNA (A.U.) promon−1 hr−1 and kdecay,rna = 0.20 ± 0.05 hr−1. Protein parameters: ktrl = 1,200 ± 200 protein (A.U.) RNA (A.U.)−1 hr−1 and kdecay,prot = 0.16 ± 0.04 hr−1. F) Measured data (top) and fit kinetic models (bottom, using kinetic scheme in D) for TF occupancy (r2 = 0.99), promoter activation (r2 = 0.90), RNA (r2 = 0.95), and protein levels (r2 = 0.92) over time and across TetO copy number. G-H) Data and fits (using kinetic scheme in D) for TF occupancy (G) and fraction of active promoters (H) for the 7xTetO constructs over time under p300 inhibition (cyan) and BAF inhibition (purple) for two biological replicates. I) Potency (kon/koff) fit from Additive Activation Model across dox timecourse with p300 (cyan) and BAF inhibition (purple). Error bars are standard deviations. J) Kinetic parameters (using kinetic scheme in D) for data in G and H displayed as fold change compared to no drug conditions. Error bars are standard error. K) Fold change of wide footprints (red) and narrow footprints (gray) for a construct with 5x ISRE elements as a function of time after stimulation with model fit up to six hours for two biological replicates. L) Fraction of promoters active for a 5x ISRE enhancer (pink) plotted as a function of time after stimulation with model fit up to six hours for two biological replicates. For comparison to a TetO promoter with the same maximum fraction active, the fraction of promoters active for a 6xTetO enhancer (black) is plotted as a function of time after maximum dox stimulation with model fit for two biological replicates. M) The relationship between average number of ISREs exhibiting a wide footprint and the fraction of promoters for different times after stimulation (up to 6 hours) and two biological replicates. N) Kinetic parameters (using kinetic scheme in D) for data in K and L displayed as fold change compared to parameters obtained for rTetR-VP48. Error bars are standard error.

Update of

References

    1. Vierstra J et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020). - PMC - PubMed
    1. Giniger E & Ptashne M Cooperative DNA binding of the yeast transcriptional activator GAL4. Proc. Natl. Acad. Sci. U. S. A. 85, 382–386 (1988). - PMC - PubMed
    1. Pettersson M & Schaffner W Synergistic activation of transcription by multiple binding sites for NF-kappa B even in absence of co-operative factor binding to DNA. J. Mol. Biol. 214, 373–380 (1990). - PubMed
    1. Spitz F & Furlong EEM Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012). - PubMed
    1. Thanos D & Maniatis T Virus induction of human IFN beta gene expression requires the assembly of an enhanceosome. Cell 83, 1091–1100 (1995). - PubMed