Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul;33(7):736-42.
doi: 10.1038/nbt.3242. Epub 2015 May 18.

Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events

Affiliations

Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events

Hagen Tilgner et al. Nat Biotechnol. 2015 Jul.

Abstract

Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of purpose and strategy of this work. (a) Multiple and distant alternative exons (red and blue) can be combined in different ways to form RNA isoforms. On a molecular level they can be included in RNA molecules in an opposed (or mutually exclusive) manner (top middle, dMEPs), in a phased manner (center middle, dMAP) or in a randomly paired manner (bottom middle). Using traditional short-read sequencing (bottom left) or microarrays (top left), these three fundamentally different situations lead to the same observation, and thus cannot be distinguished. With long-read technologies (right) it is trivial to assign each group. (b) Outline of experimental procedure for SLR-RNA-seq.
Figure 2
Figure 2
Comparison of SLRs and PacBio-CCS on the ERCC sequences. (a) Distribution of indels (per 100 nt of mapping) in PB-CCS (blue) and SLRs (red) mapped to the ERCC-control RNAs. PacBio-CCS, PB-CCS. (b) Median and mean number of 5′ missing nucleotides for PB-CCS (blue) and SLRs (red) mapped to the ERCC-control RNAs. (c) Median and mean number of 3′ missing nucleotides for PB-CCS (blue) and SLRs (red) mapped to the ERCC-control RNAs. (d) Correlation of log-transformed given concentration for the ERCC sequences and the log-transformed number of wells, in which each ERCC sequence is observed.
Figure 3
Figure 3
Comparison of SLRs and PacBio-CCS on human and mouse transcriptomes. (a) Read length obtained for a human organ panel (Hop) and in the GM12878 cell-line using single-molecule PacBio-CCS, and for a human brain sample using SLR-RNA-seq. (b) Mapping length for the same data sets as in a. (c) Percentage of reads that could be classified as full-length in the same data sets as in a. (d) Distributions of mature gene length to which spliced reads were assigned for the same data sets as in a.
Figure 4
Figure 4
Analysis of novel isoforms revealed by SLR-RNA-seq. (a) Fraction of mapped reads that show a novel splice pattern with respect to the GENCODE annotation, broken up by the number of introns in the mapping. (b) Heatmap of novel introns (with respect to GENCODE) determined by SLR-RNA-seq showing the number of times each intron was observed in the combined set of short-read ENCODE-RNA-seq data sets (ENC), in a human brain sample (HB), and in our previous data using the Roche-454 platform (454) and the PacBio-platform (PB),. (c) Fraction of mapped reads that show a novel splice pattern with respect to the GENCODE annotation, broken up by gene expression of the gene to which the read was mapped. Gene expression is here given as the fraction of wells, in which the gene was detected. (d) Fraction of mapped reads that show a novel splice pattern with respect to the GENCODE annotation, for mappings assigned to coding genes, lncRNA genes or to pseudogenes. (e) Illustration of novel isoforms revealed by SLR-RNA-seq for Npr2. Note, that we show isoforms from only four lanes of sequencing (our first round of sequencing). Some isoforms are novel, because of intron retention events, which can be easily observed. Others are novel, because they skip an exon in long transcripts (see red box)—a skipping event that occurs only in short transcripts according to the annotation. CSMM, consensus split mapped molecule, a read mapping for which all splits respect both the donor consensus and the acceptor consensus.
Figure 5
Figure 5
Analysis of distant molecularly associated exon pairs in the human brain transcriptome. (a) Number of distinct distant alternative exon pairs (that is separated by at least one constitutive exon) at different FDR values. (b) All spliced reads (from the first four lanes) overlapping the two alternative exons (gray boxes) in EXOC7. CSMM, consensus split mapped molecule. (c) Pie chart of distant alternative exon pairs at FDR = 0.05, broken up by exon kind (CDS: the RNA-deduced exon is annotated as an entirely coding exon; NEC (not entirely coding): the RNA-deduced exon is annotated as an exon, but not as an entirely coding exon; Nov: the RNA-deduced exon has at least one novel splice site). (d) Density of score of intragenic molecular association (Σ) for distant alternative exon pairs at FDR = 0.05.
Figure 6
Figure 6
Conservation of distant molecularly associated exon pairs between human and mouse. (a) Number of distant alternative exon pairs (that is, separated by at least one constitutive exon), which show nonrandom co-inclusion patterns, at different FDR values for the mouse brain. (b) Overlap between affected genes between human (at FDR = 0.3) and mouse (at FDR = 0.3).

References

    1. Kornblihtt AR, et al. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nat Rev Mol Cell Biol. 2013;14:153–165. - PubMed
    1. Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010;463:457–463. - PMC - PubMed
    1. Chen J, Weiss WA. Alternative splicing in cancer: implications for biology and therapy. Oncogene. 2014;34:1–14. - PubMed
    1. Bonnal S, Vigevani L, Valcárcel J. The spliceosome as a target of novel antitumour drugs. Nat Rev Drug Discov. 2012;11:847–859. - PubMed
    1. Ben-Dov C, Hartmann B, Lundgren J, Valcárcel J. Genome-wide analysis of alternative pre-mRNA splicing. J Biol Chem. 2008;283:1229–1233. - PubMed

Publication types