Vespucci: a system for building annotated databases of nascent transcripts
- PMID: 24304890
- PMCID: PMC3936758
- DOI: 10.1093/nar/gkt1237
Vespucci: a system for building annotated databases of nascent transcripts
Abstract
Global run-on sequencing (GRO-seq) is a recent addition to the series of high-throughput sequencing methods that enables new insights into transcriptional dynamics within a cell. However, GRO-sequencing presents new algorithmic challenges, as existing analysis platforms for ChIP-seq and RNA-seq do not address the unique problem of identifying transcriptional units de novo from short reads located all across the genome. Here, we present a novel algorithm for de novo transcript identification from GRO-sequencing data, along with a system that determines transcript regions, stores them in a relational database and associates them with known reference annotations. We use this method to analyze GRO-sequencing data from primary mouse macrophages and derive novel quantitative insights into the extent and characteristics of non-coding transcription in mammalian cells. In doing so, we demonstrate that Vespucci expands existing annotations for mRNAs and lincRNAs by defining the primary transcript beyond the polyadenylation site. In addition, Vespucci generates assemblies for un-annotated non-coding RNAs such as those transcribed from enhancer-like elements. Vespucci thereby provides a robust system for defining, storing and analyzing diverse classes of primary RNA transcripts that are of increasing biological interest.
Figures





Similar articles
-
groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data.BMC Bioinformatics. 2015 Jul 16;16:222. doi: 10.1186/s12859-015-0656-3. BMC Bioinformatics. 2015. PMID: 26173492 Free PMC article.
-
An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1070-1081. doi: 10.1109/TCBB.2016.2520919. Epub 2016 Jan 26. IEEE/ACM Trans Comput Biol Bioinform. 2017. PMID: 26829802 Free PMC article.
-
Global Run-On Sequencing (GRO-Seq).Methods Mol Biol. 2017;1468:111-20. doi: 10.1007/978-1-4939-4035-6_9. Methods Mol Biol. 2017. PMID: 27662873 Free PMC article.
-
Measuring RNA polymerase activity genome-wide with high-resolution run-on-based methods.Methods. 2019 Apr 15;159-160:177-182. doi: 10.1016/j.ymeth.2019.01.017. Epub 2019 Feb 2. Methods. 2019. PMID: 30716396 Review.
-
Characterizing and annotating the genome using RNA-seq data.Sci China Life Sci. 2017 Feb;60(2):116-125. doi: 10.1007/s11427-015-0349-4. Epub 2016 Jun 13. Sci China Life Sci. 2017. PMID: 27294835 Review.
Cited by
-
A generative model for the behavior of RNA polymerase.Bioinformatics. 2017 Jan 15;33(2):227-234. doi: 10.1093/bioinformatics/btw599. Epub 2016 Sep 23. Bioinformatics. 2017. PMID: 27663494 Free PMC article.
-
Atlas of nascent RNA transcripts reveals enhancer to gene linkages.bioRxiv [Preprint]. 2023 Dec 8:2023.12.07.570626. doi: 10.1101/2023.12.07.570626. bioRxiv. 2023. Update in: BMC Genomics. 2025 Apr 25;26(1):406. doi: 10.1186/s12864-025-11568-z. PMID: 38105978 Free PMC article. Updated. Preprint.
-
groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data.BMC Bioinformatics. 2015 Jul 16;16:222. doi: 10.1186/s12859-015-0656-3. BMC Bioinformatics. 2015. PMID: 26173492 Free PMC article.
-
An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1070-1081. doi: 10.1109/TCBB.2016.2520919. Epub 2016 Jan 26. IEEE/ACM Trans Comput Biol Bioinform. 2017. PMID: 26829802 Free PMC article.
-
Global Analyses to Identify Direct Transcriptional Targets of p53.Methods Mol Biol. 2021;2267:19-56. doi: 10.1007/978-1-0716-1217-0_3. Methods Mol Biol. 2021. PMID: 33786783
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases