Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;30(11):1655-1666.
doi: 10.1101/gr.252445.119. Epub 2020 Sep 21.

PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments

Affiliations

PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments

Peng Liu et al. Genome Res. 2020 Nov.

Abstract

Publicly available RNA-seq data is routinely used for retrospective analysis to elucidate new biology. Novel transcript discovery enabled by joint analysis of large collections of RNA-seq data sets has emerged as one such analysis. Current methods for transcript discovery rely on a '2-Step' approach where the first step encompasses building transcripts from individual data sets, followed by the second step that merges predicted transcripts across data sets. To increase the power of transcript discovery from large collections of RNA-seq data sets, we developed a novel '1-Step' approach named Pooling RNA-seq and Assembling Models (PRAM) that builds transcript models from pooled RNA-seq data sets. We demonstrate in a computational benchmark that 1-Step outperforms 2-Step approaches in predicting overall transcript structures and individual splice junctions, while performing competitively in detecting exonic nucleotides. Applying PRAM to 30 human ENCODE RNA-seq data sets identified unannotated transcripts with epigenetic and RAMPAGE signatures similar to those of recently annotated transcripts. In a case study, we discovered and experimentally validated new transcripts through the application of PRAM to mouse hematopoietic RNA-seq data sets. We uncovered new transcripts that share a differential expression pattern with a neighboring gene Pik3cg implicated in human hematopoietic phenotypes, and we provided evidence for the conservation of this relationship in human. PRAM is implemented as an R/Bioconductor package.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
1-Step outperforms 2-Step reconstruction methods. (A) Precision and recall of five meta-assembly methods in a benchmark test on target transcripts stratified by their maximum TPMs in the 30 ENCODE RNA-seq data sets: (1) TPM < 1 (413 transcripts); (2) 1 ≤ TPM < 10 (515 transcripts); and (3) TPM ≥ 10 (328 transcripts). (B) Comparison of target transcript GCM1 and predicted models by five meta-assembly methods.
Figure 2.
Figure 2.
PRAM as a new computational framework predicts a valid master set of transcript models in human intergenic regions. (A) PRAM's workflow of input (cyan), intermediate (yellow), and output (green) files, with format labeled in brackets. PRAM's R functions and example parameters for each step are displayed next to arrows. (B) Distribution of GENCODE and PRAM transcripts in terms of expression levels across seven ENCODE cell lines. (C) PRAM transcript with the highest TPM had multiple complementary genomic features supporting its existence. The model ‘plcf_chr5_minus.2607.1’ had an average TPM of 245 in HeLa-S3 cells. It had high DNase-seq signals around its 5′ exon, suggesting high chromatin accessibility, and had multiple H3K4me3 ChIP-seq peaks, suggesting active transcription. Moreover, it had two RNA Pol II ChIP-seq peaks in close proximity to its transcription start site. All of these external genomic data supported the existence of this highly expressed PRAM transcript. (D,E) RAMPAGE (D) and histone modification ChIP-seq (E) signals of GENCODE and PRAM transcripts stratified by their expression levels together with ‘silent genomic regions’ defined based on H3K27me3 peaks as negative controls in all of GM12878 or K562's data sets. RAMPAGE and ChIP-seq values were derived from replicate 1 in their corresponding data sets (Supplemental Tables 13, 14). Transcripts with promoter or genomic span mappability <0.8 were excluded from D or E, respectively, due to uncertainty in their RAMPAGE or epigenetic signals. RAMPAGE and ChIP-seq signals were calculated as reads per million (RPM) and reads per kilobase per million (RPKM), respectively.
Figure 3.
Figure 3.
Genomic features and experimental validations of PRAM mouse transcripts. (A) Workflow of applying PRAM to discover transcripts from mouse hematopoiesis-related RNA-seq data sets: input (cyan), intermediate results (yellow), and output (green). (B) PRAM transcripts CUFFp.chr12.15498 and CUFFm.chr12.33668 had multiple supporting genomic features from external data sets. (C) Semi-qRT-PCR measurements of the six PRAM models in untreated (Unt) and 48-h ß-estradiol (ß-est)-treated G1E-ER-GATA1 cells. Red dots demarcate anticipated transcript sizes. Isoforms with splice junctions distant from each other were measured separately. Gene model name prefixes were removed for brevity.
Figure 4.
Figure 4.
Expression of PRAM transcripts correlate with the neighboring gene Pik3cg in mouse and human. (A) Expression levels of PRAM transcripts and their neighboring genes in untreated (Unt) and 48-h ß-estradiol (ß-est)-treated G1E-ER-GATA1 cells. CUFFp.chr12.15498's isoform 1 was not detected by semi-qRT-PCR and thus was not measured here. Two-tailed Student's t-test; (**) P-value < 0.01, (***) P-value < <0.001. (B) Fold changes of PRAM mouse transcripts and their neighboring genes in the RNA-seq data sets of untreated and 48-h ß-estradiol-reated G1E-ER-GATA1 cells (G1E) and wild type versus deletion of Gata2 +9.5 enhancer aorta-gonad-mesonephros (AGM). (C) Counterparts of PRAM mouse transcripts in human with their supporting genomic features. (D) Semi-qRT-PCR measurement of PRAM human transcripts and their neighboring genes. Gene model name prefixes were removed for brevity. (E,F) Correlation of gene expression levels between CUFFm.chr7.6148 with PIK3CG and PRKAR2B in K562 cells (E) and TCGA-LAML patients (F).

Similar articles

Cited by

References

    1. Bernstein MN, Doan A, Dewey CN. 2017. MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bioinformatics 33: 2914–2923. 10.1093/bioinformatics/btx334 - DOI - PMC - PubMed
    1. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. 2011. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25: 1915–1927. 10.1101/gad.17446611 - DOI - PMC - PubMed
    1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10: 421 10.1186/1471-2105-10-421 - DOI - PMC - PubMed
    1. Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, Jaffe AE, Langmead B, Leek JT. 2017. Reproducible RNA-seq analysis using recount2. Nat Biotechnol 35: 319–321. 10.1038/nbt.3838 - DOI - PMC - PubMed
    1. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al. 2012. Landscape of transcription in human cells. Nature 489: 101–108. 10.1038/nature11233 - DOI - PMC - PubMed

Publication types

LinkOut - more resources