. 2023 Nov 27;51(21):e106.

doi: 10.1093/nar/gkad843.

Model-based characterization of the equilibrium dynamics of transcription initiation and promoter-proximal pausing in human cells

Yixin Zhao¹, Lingjie Liu^{1

2}, Rebecca Hassett¹, Adam Siepel^{1

2}

Affiliations

¹ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
² Graduate Program in Genetics, Stony Brook University, Stony Brook, NY, USA.

PMID: 37889042
PMCID: PMC10681744
DOI: 10.1093/nar/gkad843

Model-based characterization of the equilibrium dynamics of transcription initiation and promoter-proximal pausing in human cells

Yixin Zhao et al. Nucleic Acids Res. 2023.

. 2023 Nov 27;51(21):e106.

doi: 10.1093/nar/gkad843.

Authors

Yixin Zhao¹, Lingjie Liu^{1

2}, Rebecca Hassett¹, Adam Siepel^{1

2}

Affiliations

¹ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
² Graduate Program in Genetics, Stony Brook University, Stony Brook, NY, USA.

PMID: 37889042
PMCID: PMC10681744
DOI: 10.1093/nar/gkad843

Abstract

In metazoans, both transcription initiation and the escape of RNA polymerase (RNAP) from promoter-proximal pausing are key rate-limiting steps in gene expression. These processes play out at physically proximal sites on the DNA template and appear to influence one another through steric interactions. Here, we examine the dynamics of these processes using a combination of statistical modeling, simulation, and analysis of real nascent RNA sequencing data. We develop a simple probabilistic model that jointly describes the kinetics of transcription initiation, pause-escape, and elongation, and the generation of nascent RNA sequencing read counts under steady-state conditions. We then extend this initial model to allow for variability across cells in promoter-proximal pause site locations and steric hindrance of transcription initiation from paused RNAPs. In an extensive series of simulations, we show that this model enables accurate estimation of initiation and pause-escape rates. Furthermore, we show by simulation and analysis of real data that pause-escape is often strongly rate-limiting and that steric hindrance can dramatically reduce initiation rates. Our modeling framework is applicable to a variety of inference problems, and our software for estimation and simulation is freely available.

PubMed Disclaimer

Figures

**Figure 1.**
(A) Conceptual illustration of model, focusing on the kinetic model for RNAP movement on the DNA template. Gray arrow indicates that a second layer of the model describes generation of nascent RNA sequencing (NRS) read counts based on the distribution of RNAP positions across cells. (B) Graphical model representation with unobserved continuous-time Markov chain (Z_i) and observed read counts (X_i). Read counts at each site X_i are conditionally independent and Poisson-distributed given mean μ_i, which reflects both the density P(Z_i) and the sequencing depth λ. (C) Design of SimPol (‘Simulator of Polymerases’). Based on user-defined initiation, pause-escape, and elongation rates, SimPol tracks the movement *in silico* of RNAPs across N-bp DNA templates in C cells, then samples synthetic read counts based on RNAP positions. SimPol identifies collisions and prohibits RNAPs from passing one another. It also models variable pause sites and elongation rates. (D) Example of synthetic nascent RNA sequencing data from SimPol, shown in IGV (74) alongside matched real PRO-seq data from (19) for the *DNAJA1* gene on chromosome 9 of the human genome.

**Figure 2.**
(A) Two-state continuous-time Markov model for steric hindrance of transcriptional initiation, assuming at most one RNAP at a time in the pause region. The pause region must be either unoccupied (state 0) or already occupied by another RNAP (state 1). Transitions from state 0 to state 1 occur at the (unimpeded) initiation rate, αζ, and transitions from state 1 to state 0 occur at the pause-escape rate, βζ. The stationary frequency of state 1 defines the landing-pad occupancy ϕ and is given by . (B) Illustration showing a hypothetical distribution of pause sites k and its implications for the number of RNAPs that can simultaneously occupy the pause region. When k ≤ s_p, where s_p is the minimum center-to-center spacing between adjacent RNAPs, only one RNAP is possible (Case 1 in the text); when s_p < k ≤ 2s_p, up to two are possible (Case 2); and when 2s_p < k ≤ 3s_p, up to three are possible (Case 3). Notice that the portion of the density corresponding to each Case r is given by q_r. (C) Generalization of Markov model to accommodate up to two RNAPs in the pause region (Case 2). (D) Further generalization to accommodate up to three RNAPs (Case 3). The equation for ϕ can be generalized to account for these cases (see text).

formula image — **Figure 2.**
(A) Two-state continuous-time Markov model for steric hindrance of transcriptional initiation, assuming at most one RNAP at a time in the pause region. The pause region must be either unoccupied (state 0) or already occupied by another RNAP (state 1). Transitions from state 0 to state 1 occur at the (unimpeded) initiation rate, αζ, and transitions from state 1 to state 0 occur at the pause-escape rate, βζ. The stationary frequency of state 1 defines the landing-pad occupancy ϕ and is given by . (B) Illustration showing a hypothetical distribution of pause sites k and its implications for the number of RNAPs that can simultaneously occupy the pause region. When k ≤ s_p, where s_p is the minimum center-to-center spacing between adjacent RNAPs, only one RNAP is possible (Case 1 in the text); when s_p < k ≤ 2s_p, up to two are possible (Case 2); and when 2s_p < k ≤ 3s_p, up to three are possible (Case 3). Notice that the portion of the density corresponding to each Case r is given by q_r. (C) Generalization of Markov model to accommodate up to two RNAPs in the pause region (Case 2). (D) Further generalization to accommodate up to three RNAPs (Case 3). The equation for ϕ can be generalized to account for these cases (see text).

**Figure 3.**
Accuracy of estimated values of the transcription initiation rate α and pause-escape rate β under the initial version of the model. Estimates are expressed as products with the elongation rate ζ (αζ and βζ). (A) Simulated true vs. estimated values of αζ, for αζ ∈ {0.1, 1, 10} (left to right) and βζ ∈ {0.1, 1, 10} (see key). Dashed lines indicate the ground truth. (B–D). Estimated values of βζ for simulated true values of βζ = 1 and αζ ∈ {0.1, 1, 10} (see key), when the pause-site k is fixed (B) or variable across cells (**C, D**) in simulation. In panel C, β is estimated using the average read-depth in the pause peak, and in panel D it is estimated using the sum of read counts across the region. Dashed lines indicate the ground truth. Results for other values of βζ are shown in Supplementary Figure S1. All boxplots summarize 50 replicates of the simulation; box boundaries indicate 1st and 3rd quartiles, and horizontal line indicates median. A value of ζ = 2 kb/min is assumed; αζ and βζ can be assumed to have units of events per minute. Pause sites occur at a mean position of k = 50 nt. In the variable case, we assume a Gaussian distribution with a standard deviation of 25 nt.

**Figure 4.**
(A) Accuracy of estimated values of the pause-escape rate β under the version of the model that allows for a distribution of pause-sites k across cells. Shown are estimated values of βζ for simulated true values of βζ = 1 and αζ ∈ {0.1, 1, 10} (see key), when the pause-site k is fixed (left) or variable across cells (right) in simulation. Dashed lines indicate the ground truth. Results for other values of βζ are shown in Supplementary Figure S2. All boxplots summarize 50 replicates of the simulation; box boundaries indicate 1st and 3rd quartiles, and horizontal line indicates median. Simulated pause sites occurred at a mean position of k = 50 nt. In the variable case, we assumed a Gaussian distribution with a standard deviation of 25 nt. (B) Examples of pause peaks in simulated data, showing assumed distribution of pause sites (blue dashed line) and distribution inferred by expectation maximization (red solid line). (C) Similar examples from real data from (20). (D) Estimates of βζ under the original averaging approach (horizontal axis) vs. estimates of βζ under the model that allows for variable k across cells (vertical axis). (E) Contour plot showing the distribution of estimated means (horizontal axis) and standard deviations (vertical axis) of the pause peak position k, under the ‘no heat shock’ (NHS) and ‘heat shock’ (HS) conditions. Data from (20). In panels A and D, a value of ζ = 2 kb/min is assumed; thus, αζ and βζ can be assumed to have units of events per minute.

**Figure 5.**
(A) Accuracy of estimated landing-pad occupancy ϕ under the version of the model that allows for steric hindrance in initiation and multiple RNAPs per pause region. Scatter plots show the fraction of simulated cells for which the first 50 nt (the ‘landing-pad’) are occupied by an RNAP at steady state (‘Empirical ϕ’) vs. the fraction predicted to be occupied under the model (‘Estimated ϕ’) based on the simulated NRS data, assuming a minimum spacing of s_p = 50 nt. Results are shown for simulated true values of αζ ∈ {0.1, 1, 10} (left to right) and βζ ∈ {0.1, 1, 10} (see key), with 50 simulations per parameter combination. Dashed line indicates y = x, and colored crosses represent the means of the corresponding points. A value of ζ = 2 kb/min is assumed, so that αζ and βζ are in events per minute. (B) Distribution of estimated ϕ for 6182 robustly expressed genes in K562 cells before (NHS) and after (HS) heat shock under the low (L) calibration (20) (see Materials & Methods for details). (C) Percentages of genes having fully occupied landing-pads (ϕ > 0.95) before (NHS) and after (HS) heat shock, under the low (L) and high (H) calibrations. (D, E) Distributions of scaled estimates of the ‘effective’ (ωζ) and ‘potential’ (αζ) rates of transcription initiation, in events per minute per cell, for the same genes. Panel D represents the NHS case and panel E represents the HS case. The x-axes are truncated to highlight the bulk of the distributions. Gray arrows indicate effects of steric hindrance.

See this image and copyright information in PMC

Cited by

Evolution of promoter-proximal pausing enabled a new layer of transcription control.
Chivu AG, Abuhashem A, Barshad G, Rice EJ, Leger MM, Vill AC, Wong W, Brady R, Smith JJ, Wikramanayake AH, Arenas-Mena C, Brito IL, Ruiz-Trillo I, Hadjantonakis AK, Lis JT, Lewis JJ, Danko CG. Chivu AG, et al. Res Sq [Preprint]. 2023 Mar 24:rs.3.rs-2679520. doi: 10.21203/rs.3.rs-2679520/v1. Res Sq. 2023. PMID: 36993251 Free PMC article. Preprint.
Genome-wide dynamic nascent transcript profiles reveal that most paused RNA polymerases terminate.
Mukherjee R, Guertin MJ. Mukherjee R, et al. bioRxiv [Preprint]. 2025 Mar 28:2025.03.27.645809. doi: 10.1101/2025.03.27.645809. bioRxiv. 2025. PMID: 40196675 Free PMC article. Preprint.
Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data.
Liu L, Zhao Y, Hassett R, Toneyan S, Koo PK, Siepel A. Liu L, et al. Nucleic Acids Res. 2025 Feb 8;53(4):gkaf092. doi: 10.1093/nar/gkaf092. Nucleic Acids Res. 2025. PMID: 39964478 Free PMC article.
Evolution of promoter-proximal pausing enabled a new layer of transcription control.
Chivu AG, Basso BA, Abuhashem A, Leger MM, Barshad G, Rice EJ, Vill AC, Wong W, Chou SP, Chovatiya G, Brady R, Smith JJ, Wikramanayake AH, Arenas-Mena C, Brito IL, Ruiz-Trillo I, Hadjantonakis AK, Lis JT, Lewis JJ, Danko CG. Chivu AG, et al. bioRxiv [Preprint]. 2024 Oct 12:2023.02.19.529146. doi: 10.1101/2023.02.19.529146. bioRxiv. 2024. PMID: 39416036 Free PMC article. Preprint.
DNA-sequence and epigenomic determinants of local rates of transcription elongation.
Liu L, Zhao Y, Siepel A. Liu L, et al. bioRxiv [Preprint]. 2023 Dec 23:2023.12.21.572932. doi: 10.1101/2023.12.21.572932. bioRxiv. 2023. PMID: 38187771 Free PMC article. Preprint.

See all "Cited by" articles

References

1. Ptashne M., Gann A.. Transcriptional activation by recruitment. Nature. 1997; 386:569–577. - PubMed
1. Rougvie A.E., Lis J.T.. Postinitiation transcriptional control in Drosophila melanogaster. Mol. Cell Biol. 1990; 10:6041–6045. - PMC - PubMed
1. Strobl L.J., Eick D.. Hold back of RNA polymerase II at the transcription start site mediates down-regulation of c-myc in vivo. EMBO J. 1992; 11:3307–3314. - PMC - PubMed
1. Krumm A., Meulia T., Brunvand M., Groudine M.. The block to transcriptional elongation within the human c-myc gene is determined in the promoter-proximal region. Genes Dev. 1992; 6:2201–2213. - PubMed
1. Rasmussen E.B., Lis J.T.. In vivo transcriptional pausing and cap formation on three Drosophila heat shock genes. Proc. Natl. Acad. Sci. USA. 1993; 90:7923–7927. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Model-based characterization of the equilibrium dynamics of transcription initiation and promoter-proximal pausing in human cells

Affiliations

Model-based characterization of the equilibrium dynamics of transcription initiation and promoter-proximal pausing in human cells

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources