. 2019 Jun 1;35(11):1877-1884.

doi: 10.1093/bioinformatics/bty886.

Statistical inference of the rate of RNA polymerase II elongation by total RNA sequencing

Yumi Kawamura¹, Shinsuke Koyama^{1

2}, Ryo Yoshida^{1

3}

Affiliations

¹ Department of Statistical Science, The Graduate University for Advanced Studies (SOKENDAI), Tachikawa, Japan.
² Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Japan.
³ Department of Statistical Data Science, The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Japan.

PMID: 30376061
PMCID: PMC6546130
DOI: 10.1093/bioinformatics/bty886

Statistical inference of the rate of RNA polymerase II elongation by total RNA sequencing

Yumi Kawamura et al. Bioinformatics. 2019.

. 2019 Jun 1;35(11):1877-1884.

doi: 10.1093/bioinformatics/bty886.

Authors

Yumi Kawamura¹, Shinsuke Koyama^{1

2}, Ryo Yoshida^{1

3}

Affiliations

¹ Department of Statistical Science, The Graduate University for Advanced Studies (SOKENDAI), Tachikawa, Japan.
² Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Japan.
³ Department of Statistical Data Science, The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Japan.

PMID: 30376061
PMCID: PMC6546130
DOI: 10.1093/bioinformatics/bty886

Abstract

Motivation: Sequencing total RNA without poly-A selection enables us to obtain a transcriptomic profile of nascent RNAs undergoing transcription with co-transcriptional splicing. In general, the RNA-seq reads exhibit a sawtooth pattern in a gene, which is characterized by a monotonically decreasing gradient across introns in the 5'-3' direction, and by substantially higher levels of RNA-seq reads present in exonic regions. Such patterns result from the process of underlying transcription elongation by RNA polymerase II, which traverses the DNA strand in a 5'-3' direction as it performs a complex series of mRNA synthesis and processing. Therefore, data of sequenced total RNAs could be utilized to infer the rate of transcription elongation by solving the inverse problem.

Results: Though solving the inverse problem in total RNA-seq has the great potential, statistical methods have not yet been fully developed. We demonstrate what extent the newly developed method can be useful. The objective is to reconstruct the spatial distribution of transcription elongation rates in a gene from a given noisy, sawtooth-like profile. It is necessary to recover the signal source of the elongation rates separately from several types of nuisance factors, such as unobserved modes of co-transcriptionally occurring mRNA splicing, which exert significant influences on the sawtooth shape. The present method was tested using published total RNA-seq data derived from mouse embryonic stem cells. We investigated the spatial characteristics of the estimated elongation rates, focusing especially on the relation to promoter-proximal pausing of RNA polymerase II, nucleosome occupancy and histone modification patterns.

Availability and implementation: A C implementation of PolSter and sample data are available at https://github.com/yoshida-lab/PolSter.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Inverse problem of the transcription elongation rate. (A) Total RNA-seq captures a mixture of matured and nascent transcripts in a pool of cells. During the displacement of Pol II from 5’ to 3’, elongating and co-transcriptionally spliced RNAs can take various states as shown in the middle. The sawtooth pattern of sequenced RNA-seq reads shown in the bottom results from the expected frequency of nucleotides included in those transcripts at various stages. This figure was created by referring to Figure 2 of Ameur *et al.* (2011). (B) Total RNA-seq reads of a gene (GRM7) in human fetal brain (Ameur *et al.*, 2011). Splice variants reported in hg19, GRCh37 (Genome Reference Consortium Human Reference 37) are shown in the upper side

**Fig. 2.**
(A) Four splicing modes to be modeled in the system with illustrative examples: (i) conventional mode, (ii) intron retention, (iii) RS of introns and (iv) exon skipping. (B) Infeasible and feasible modes of exon skipping are exemplified in (i) and (ii), respectively

**Fig. 3.**
Read density of the OPCML gene in human fetal brain (Ameur *et al.*, 2011). The observed valley in the intron implies the occurrence of RS

**Fig. 4.**
Estimated Pol II density, expected read density and splicing patterns are shown on the DNA coordinates of the Cdk19 gene in the 5’–3’ direction. The observed read counts are shown in the top panel

**Fig. 5.**
The estimated elongation rates of the 653 genes are arranged on the vertical axis. The horizontal axis denotes the relative position from TSS. The color scale chart shown on the side denotes the estimated values normalized to $[0, 1]$

**Fig. 6.**
Correlation coefficients between the estimated Pol II densities and (A) ChIP-seq profiles of histone modifiers, (B) nucleosome occupancies observed by MNase-seq from mouse ES cells. (C) Differences between the averages of the estimated Pol II densities in regions with and without a chromatin state annotation. The 15 annotations shown in the right panel were obtained by performing ChromHMM on the ChIP-seq profiles of the histone modifiers. (D) Correlation coefficients between the estimated Pol II densities and two ChIP-sep profiles of Pol II. The color scale charts shown on the sides denote the given values in which the mean differences shown in (C) are scaled to $[- 1, 1]$

See this image and copyright information in PMC

Cited by

RNA polymerase II speed: a key player in controlling and adapting transcriptome composition.
Muniz L, Nicolas E, Trouche D. Muniz L, et al. EMBO J. 2021 Aug 2;40(15):e105740. doi: 10.15252/embj.2020105740. Epub 2021 Jul 13. EMBO J. 2021. PMID: 34254686 Free PMC article. Review.
Global impact of aberrant splicing on human gene expression levels.
Fair B, Najar CBA, Zhao J, Lozano S, Reilly A, Mossian G, Staley JP, Wang J, Li YI. Fair B, et al. bioRxiv [Preprint]. 2023 Oct 16:2023.09.13.557588. doi: 10.1101/2023.09.13.557588. bioRxiv. 2023. Update in: Nat Genet. 2024 Sep;56(9):1851-1861. doi: 10.1038/s41588-024-01872-x. PMID: 37745605 Free PMC article. Updated. Preprint.
Geometrically encoded positioning of introns, intergenic segments, and exons in the human genome.
Almassalha LM, MacQuarrie KL, Carignano M, Dunton C, Gong R, Ibarra J, Carter LM, Li WS, Nap R, Dulai PS, Szleifer I, Backman V. Almassalha LM, et al. bioRxiv [Preprint]. 2025 May 29:2025.05.29.656862. doi: 10.1101/2025.05.29.656862. bioRxiv. 2025. PMID: 40501616 Free PMC article. Preprint.
RNA Polymerase II Activity Control of Gene Expression and Involvement in Disease.
Kuldell JC, Kaplan CD. Kuldell JC, et al. J Mol Biol. 2025 Jan 1;437(1):168770. doi: 10.1016/j.jmb.2024.168770. Epub 2024 Aug 28. J Mol Biol. 2025. PMID: 39214283 Free PMC article. Review.

References

1. Ameur A., et al. (2011) Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat. Struct. Mol. Biol., 18, 1435–1440. - PubMed
1. Bentley D.L. (2014) Coupling mRNA processing with transcription in time and space. Nat. Rev. Genet., 15, 163–175. - PMC - PubMed
1. Bolić M., et al. (2004) Resampling algorithms for particle filters: a computational complexity perspective. EURASIP J. Appl. Signal Process., 15, 2267–2277.
1. Brown S.J., et al. (2012) Chromatin and epigenetic regulation of pre-mRNA processing. Hum. Mol. Genet., 21, R90–R96. - PMC - PubMed
1. Chae M., et al. (2015) groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinformatics, 16, 222. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Statistical inference of the rate of RNA polymerase II elongation by total RNA sequencing

Affiliations

Statistical inference of the rate of RNA polymerase II elongation by total RNA sequencing

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Miscellaneous