Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 1;35(11):1877-1884.
doi: 10.1093/bioinformatics/bty886.

Statistical inference of the rate of RNA polymerase II elongation by total RNA sequencing

Affiliations

Statistical inference of the rate of RNA polymerase II elongation by total RNA sequencing

Yumi Kawamura et al. Bioinformatics. .

Abstract

Motivation: Sequencing total RNA without poly-A selection enables us to obtain a transcriptomic profile of nascent RNAs undergoing transcription with co-transcriptional splicing. In general, the RNA-seq reads exhibit a sawtooth pattern in a gene, which is characterized by a monotonically decreasing gradient across introns in the 5'-3' direction, and by substantially higher levels of RNA-seq reads present in exonic regions. Such patterns result from the process of underlying transcription elongation by RNA polymerase II, which traverses the DNA strand in a 5'-3' direction as it performs a complex series of mRNA synthesis and processing. Therefore, data of sequenced total RNAs could be utilized to infer the rate of transcription elongation by solving the inverse problem.

Results: Though solving the inverse problem in total RNA-seq has the great potential, statistical methods have not yet been fully developed. We demonstrate what extent the newly developed method can be useful. The objective is to reconstruct the spatial distribution of transcription elongation rates in a gene from a given noisy, sawtooth-like profile. It is necessary to recover the signal source of the elongation rates separately from several types of nuisance factors, such as unobserved modes of co-transcriptionally occurring mRNA splicing, which exert significant influences on the sawtooth shape. The present method was tested using published total RNA-seq data derived from mouse embryonic stem cells. We investigated the spatial characteristics of the estimated elongation rates, focusing especially on the relation to promoter-proximal pausing of RNA polymerase II, nucleosome occupancy and histone modification patterns.

Availability and implementation: A C implementation of PolSter and sample data are available at https://github.com/yoshida-lab/PolSter.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Inverse problem of the transcription elongation rate. (A) Total RNA-seq captures a mixture of matured and nascent transcripts in a pool of cells. During the displacement of Pol II from 5’ to 3’, elongating and co-transcriptionally spliced RNAs can take various states as shown in the middle. The sawtooth pattern of sequenced RNA-seq reads shown in the bottom results from the expected frequency of nucleotides included in those transcripts at various stages. This figure was created by referring to Figure 2 of Ameur et al. (2011). (B) Total RNA-seq reads of a gene (GRM7) in human fetal brain (Ameur et al., 2011). Splice variants reported in hg19, GRCh37 (Genome Reference Consortium Human Reference 37) are shown in the upper side
Fig. 2.
Fig. 2.
(A) Four splicing modes to be modeled in the system with illustrative examples: (i) conventional mode, (ii) intron retention, (iii) RS of introns and (iv) exon skipping. (B) Infeasible and feasible modes of exon skipping are exemplified in (i) and (ii), respectively
Fig. 3.
Fig. 3.
Read density of the OPCML gene in human fetal brain (Ameur et al., 2011). The observed valley in the intron implies the occurrence of RS
Fig. 4.
Fig. 4.
Estimated Pol II density, expected read density and splicing patterns are shown on the DNA coordinates of the Cdk19 gene in the 5’–3’ direction. The observed read counts are shown in the top panel
Fig. 5.
Fig. 5.
The estimated elongation rates of the 653 genes are arranged on the vertical axis. The horizontal axis denotes the relative position from TSS. The color scale chart shown on the side denotes the estimated values normalized to [0,1]
Fig. 6.
Fig. 6.
Correlation coefficients between the estimated Pol II densities and (A) ChIP-seq profiles of histone modifiers, (B) nucleosome occupancies observed by MNase-seq from mouse ES cells. (C) Differences between the averages of the estimated Pol II densities in regions with and without a chromatin state annotation. The 15 annotations shown in the right panel were obtained by performing ChromHMM on the ChIP-seq profiles of the histone modifiers. (D) Correlation coefficients between the estimated Pol II densities and two ChIP-sep profiles of Pol II. The color scale charts shown on the sides denote the given values in which the mean differences shown in (C) are scaled to [1,1]

Similar articles

Cited by

References

    1. Ameur A., et al. (2011) Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat. Struct. Mol. Biol., 18, 1435–1440. - PubMed
    1. Bentley D.L. (2014) Coupling mRNA processing with transcription in time and space. Nat. Rev. Genet., 15, 163–175. - PMC - PubMed
    1. Bolić M., et al. (2004) Resampling algorithms for particle filters: a computational complexity perspective. EURASIP J. Appl. Signal Process., 15, 2267–2277.
    1. Brown S.J., et al. (2012) Chromatin and epigenetic regulation of pre-mRNA processing. Hum. Mol. Genet., 21, R90–R96. - PMC - PubMed
    1. Chae M., et al. (2015) groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinformatics, 16, 222. - PMC - PubMed

Publication types