This is a preprint.
DNA-sequence and epigenomic determinants of local rates of transcription elongation
- PMID: 38187771
- PMCID: PMC10769381
- DOI: 10.1101/2023.12.21.572932
DNA-sequence and epigenomic determinants of local rates of transcription elongation
Abstract
Across all branches of life, transcription elongation is a crucial, regulated phase in gene expression. Many recent studies in eukaryotes have focused on the regulation of promoter-proximal pausing of RNA Polymerase II (Pol II), but rates of productive elongation also vary substantially throughout the gene body, both within and across genes. Here, we introduce a probabilistic model for systematically evaluating potential determinants of the local elongation rate based on nascent RNA sequencing (NRS) data. Our model is derived from a unified model for both the kinetics of Pol II movement along the DNA template and the generation of NRS read counts at steady state. It allows for a continuously variable elongation rate along the gene body, with the rate at each nucleotide defined by a generalized linear relationship with nearby genomic and epigenomic features. High-dimensional feature vectors are accommodated through a sparse-regression extension. We show with simulations that the model allows accurate detection of associated features and accurate prediction of local elongation rates. In an analysis of public PRO-seq and epigenomic data, we identify several features that are strongly associated with reductions in the local elongation rate, including DNA methylation, splice sites, RNA stem-loops, CTCF binding sites, and several histone marks, including H3K36me3 and H4K20me1. By contrast, low-complexity sequences and H3K79me2 marks are associated with increases in elongation rate. In an analysis of DNA -mers, we find that cytosine nucleotides are strongly associated with reductions in local elongation rate, particularly when preceded by guanines and followed by adenines or thymines. Increases in elongation rate are associated with thymines and A+T-rich -mers. These associations are generally shared across cell types, and by considering them our model is effective at predicting features of held-out PRO-seq data. Overall, our analysis is the first to permit genome-wide predictions of relative nucleotide-specific elongation rates based on complex sets of genomic and epigenomic covariates. We have made predictions available for the K562, CD14+, MCF-7, and HeLa-S3 cell types in a UCSC Genome Browser track.
Conflict of interest statement
Conflict of Interest The authors declare no competing interests.
Figures






Similar articles
-
Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data.Nucleic Acids Res. 2025 Feb 8;53(4):gkaf092. doi: 10.1093/nar/gkaf092. Nucleic Acids Res. 2025. PMID: 39964478 Free PMC article.
-
A machine learning-based framework for modeling transcription elongation.Proc Natl Acad Sci U S A. 2021 Feb 9;118(6):e2007450118. doi: 10.1073/pnas.2007450118. Proc Natl Acad Sci U S A. 2021. PMID: 33526657 Free PMC article.
-
A dual role for the histone methyltransferase PR-SET7/SETD8 and histone H4 lysine 20 monomethylation in the local regulation of RNA polymerase II pausing.J Biol Chem. 2014 Mar 14;289(11):7425-37. doi: 10.1074/jbc.M113.520783. Epub 2014 Jan 23. J Biol Chem. 2014. PMID: 24459145 Free PMC article.
-
Promoter-proximal regulation of gene transcription: Key factors involved and emerging role of general transcription factors in assisting productive elongation.Gene. 2023 Aug 20;878:147571. doi: 10.1016/j.gene.2023.147571. Epub 2023 Jun 16. Gene. 2023. PMID: 37331491 Review.
-
Pause & go: from the discovery of RNA polymerase pausing to its functional implications.Curr Opin Cell Biol. 2017 Jun;46:72-80. doi: 10.1016/j.ceb.2017.03.002. Epub 2017 Mar 28. Curr Opin Cell Biol. 2017. PMID: 28363125 Free PMC article. Review.
References
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials