Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 27;51(21):e106.
doi: 10.1093/nar/gkad843.

Model-based characterization of the equilibrium dynamics of transcription initiation and promoter-proximal pausing in human cells

Affiliations

Model-based characterization of the equilibrium dynamics of transcription initiation and promoter-proximal pausing in human cells

Yixin Zhao et al. Nucleic Acids Res. .

Abstract

In metazoans, both transcription initiation and the escape of RNA polymerase (RNAP) from promoter-proximal pausing are key rate-limiting steps in gene expression. These processes play out at physically proximal sites on the DNA template and appear to influence one another through steric interactions. Here, we examine the dynamics of these processes using a combination of statistical modeling, simulation, and analysis of real nascent RNA sequencing data. We develop a simple probabilistic model that jointly describes the kinetics of transcription initiation, pause-escape, and elongation, and the generation of nascent RNA sequencing read counts under steady-state conditions. We then extend this initial model to allow for variability across cells in promoter-proximal pause site locations and steric hindrance of transcription initiation from paused RNAPs. In an extensive series of simulations, we show that this model enables accurate estimation of initiation and pause-escape rates. Furthermore, we show by simulation and analysis of real data that pause-escape is often strongly rate-limiting and that steric hindrance can dramatically reduce initiation rates. Our modeling framework is applicable to a variety of inference problems, and our software for estimation and simulation is freely available.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
(A) Conceptual illustration of model, focusing on the kinetic model for RNAP movement on the DNA template. Gray arrow indicates that a second layer of the model describes generation of nascent RNA sequencing (NRS) read counts based on the distribution of RNAP positions across cells. (B) Graphical model representation with unobserved continuous-time Markov chain (Zi) and observed read counts (Xi). Read counts at each site Xi are conditionally independent and Poisson-distributed given mean μi, which reflects both the density P(Zi) and the sequencing depth λ. (C) Design of SimPol (‘Simulator of Polymerases’). Based on user-defined initiation, pause-escape, and elongation rates, SimPol tracks the movement in silico of RNAPs across N-bp DNA templates in C cells, then samples synthetic read counts based on RNAP positions. SimPol identifies collisions and prohibits RNAPs from passing one another. It also models variable pause sites and elongation rates. (D) Example of synthetic nascent RNA sequencing data from SimPol, shown in IGV (74) alongside matched real PRO-seq data from (19) for the DNAJA1 gene on chromosome 9 of the human genome.
Figure 2.
Figure 2.
(A) Two-state continuous-time Markov model for steric hindrance of transcriptional initiation, assuming at most one RNAP at a time in the pause region. The pause region must be either unoccupied (state 0) or already occupied by another RNAP (state 1). Transitions from state 0 to state 1 occur at the (unimpeded) initiation rate, αζ, and transitions from state 1 to state 0 occur at the pause-escape rate, βζ. The stationary frequency of state 1 defines the landing-pad occupancy ϕ and is given by formula image. (B) Illustration showing a hypothetical distribution of pause sites k and its implications for the number of RNAPs that can simultaneously occupy the pause region. When ksp, where sp is the minimum center-to-center spacing between adjacent RNAPs, only one RNAP is possible (Case 1 in the text); when sp < k ≤ 2sp, up to two are possible (Case 2); and when 2sp < k ≤ 3sp, up to three are possible (Case 3). Notice that the portion of the density corresponding to each Case r is given by qr. (C) Generalization of Markov model to accommodate up to two RNAPs in the pause region (Case 2). (D) Further generalization to accommodate up to three RNAPs (Case 3). The equation for ϕ can be generalized to account for these cases (see text).
Figure 3.
Figure 3.
Accuracy of estimated values of the transcription initiation rate α and pause-escape rate β under the initial version of the model. Estimates are expressed as products with the elongation rate ζ (αζ and βζ). (A) Simulated true vs. estimated values of αζ, for αζ ∈ {0.1, 1, 10} (left to right) and βζ ∈ {0.1, 1, 10} (see key). Dashed lines indicate the ground truth. (B–D). Estimated values of βζ for simulated true values of βζ = 1 and αζ ∈ {0.1, 1, 10} (see key), when the pause-site k is fixed (B) or variable across cells (C, D) in simulation. In panel C, β is estimated using the average read-depth in the pause peak, and in panel D it is estimated using the sum of read counts across the region. Dashed lines indicate the ground truth. Results for other values of βζ are shown in Supplementary Figure S1. All boxplots summarize 50 replicates of the simulation; box boundaries indicate 1st and 3rd quartiles, and horizontal line indicates median. A value of ζ = 2 kb/min is assumed; αζ and βζ can be assumed to have units of events per minute. Pause sites occur at a mean position of k = 50 nt. In the variable case, we assume a Gaussian distribution with a standard deviation of 25 nt.
Figure 4.
Figure 4.
(A) Accuracy of estimated values of the pause-escape rate β under the version of the model that allows for a distribution of pause-sites k across cells. Shown are estimated values of βζ for simulated true values of βζ = 1 and αζ ∈ {0.1, 1, 10} (see key), when the pause-site k is fixed (left) or variable across cells (right) in simulation. Dashed lines indicate the ground truth. Results for other values of βζ are shown in Supplementary Figure S2. All boxplots summarize 50 replicates of the simulation; box boundaries indicate 1st and 3rd quartiles, and horizontal line indicates median. Simulated pause sites occurred at a mean position of k = 50 nt. In the variable case, we assumed a Gaussian distribution with a standard deviation of 25 nt. (B) Examples of pause peaks in simulated data, showing assumed distribution of pause sites (blue dashed line) and distribution inferred by expectation maximization (red solid line). (C) Similar examples from real data from (20). (D) Estimates of βζ under the original averaging approach (horizontal axis) vs. estimates of βζ under the model that allows for variable k across cells (vertical axis). (E) Contour plot showing the distribution of estimated means (horizontal axis) and standard deviations (vertical axis) of the pause peak position k, under the ‘no heat shock’ (NHS) and ‘heat shock’ (HS) conditions. Data from (20). In panels A and D, a value of ζ = 2 kb/min is assumed; thus, αζ and βζ can be assumed to have units of events per minute.
Figure 5.
Figure 5.
(A) Accuracy of estimated landing-pad occupancy ϕ under the version of the model that allows for steric hindrance in initiation and multiple RNAPs per pause region. Scatter plots show the fraction of simulated cells for which the first 50 nt (the ‘landing-pad’) are occupied by an RNAP at steady state (‘Empirical ϕ’) vs. the fraction predicted to be occupied under the model (‘Estimated ϕ’) based on the simulated NRS data, assuming a minimum spacing of sp = 50 nt. Results are shown for simulated true values of αζ ∈ {0.1, 1, 10} (left to right) and βζ ∈ {0.1, 1, 10} (see key), with 50 simulations per parameter combination. Dashed line indicates y = x, and colored crosses represent the means of the corresponding points. A value of ζ = 2 kb/min is assumed, so that αζ and βζ are in events per minute. (B) Distribution of estimated ϕ for 6182 robustly expressed genes in K562 cells before (NHS) and after (HS) heat shock under the low (L) calibration (20) (see Materials & Methods for details). (C) Percentages of genes having fully occupied landing-pads (ϕ > 0.95) before (NHS) and after (HS) heat shock, under the low (L) and high (H) calibrations. (D, E) Distributions of scaled estimates of the ‘effective’ (ωζ) and ‘potential’ (αζ) rates of transcription initiation, in events per minute per cell, for the same genes. Panel D represents the NHS case and panel E represents the HS case. The x-axes are truncated to highlight the bulk of the distributions. Gray arrows indicate effects of steric hindrance.

Similar articles

Cited by

References

    1. Ptashne M., Gann A.. Transcriptional activation by recruitment. Nature. 1997; 386:569–577. - PubMed
    1. Rougvie A.E., Lis J.T.. Postinitiation transcriptional control in Drosophila melanogaster. Mol. Cell Biol. 1990; 10:6041–6045. - PMC - PubMed
    1. Strobl L.J., Eick D.. Hold back of RNA polymerase II at the transcription start site mediates down-regulation of c-myc in vivo. EMBO J. 1992; 11:3307–3314. - PMC - PubMed
    1. Krumm A., Meulia T., Brunvand M., Groudine M.. The block to transcriptional elongation within the human c-myc gene is determined in the promoter-proximal region. Genes Dev. 1992; 6:2201–2213. - PubMed
    1. Rasmussen E.B., Lis J.T.. In vivo transcriptional pausing and cap formation on three Drosophila heat shock genes. Proc. Natl. Acad. Sci. USA. 1993; 90:7923–7927. - PMC - PubMed