Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb;28(2):231-242.
doi: 10.1101/gr.230516.117. Epub 2017 Dec 1.

Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

Affiliations

Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

Hagen Tilgner et al. Genome Res. 2018 Feb.

Abstract

Understanding transcriptome complexity is crucial for understanding human biology and disease. Technologies such as Synthetic long-read RNA sequencing (SLR-RNA-seq) delivered 5 million isoforms and allowed assessing splicing coordination. Pacific Biosciences and Oxford Nanopore increase throughput also but require high input amounts or amplification. Our new droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k-200k partitions of 10-200 molecules at a time, enabling analysis of 10-100 million RNA molecules. SpISO-seq requires less than 1 ng of input cDNA, limiting or removing the need for prior amplification with its associated biases. Adjusting the number of reads devoted to each molecule reduces sequencing lanes and cost, with little loss in detection power. The increased number of molecules expands our understanding of isoform complexity. In addition to confirming our previously published cases of splicing coordination (e.g., BIN1), the greater depth reveals many new cases, such as MAPT Coordination of internal exons is found to be extensive among protein coding genes: 23.5%-59.3% (95% confidence interval) of highly expressed genes with distant alternative exons exhibit coordination, showcasing the need for long-read transcriptomics. However, coordination is less frequent for noncoding sequences, suggesting a larger role of splicing coordination in shaping proteins. Groups of genes with coordination are involved in protein-protein interactions with each other, raising the possibility that coordination facilitates complex formation and/or function. We also find new splicing coordination types, involving initial and terminal exons. Our results provide a more comprehensive understanding of the human transcriptome and a general, cost-effective method to analyze it.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Comparative outline of the spISO-seq and the previously published SLR-RNA-seq approach. (1) Both approaches rely on the principle of compartmentalization. The fewer cDNA molecules are separated into one compartment, the lower the probability of having two nonidentical molecules from the same gene. SLR-RNA-seq employs 1000–2000 molecules per well on 384-well plates, while spISO-seq employs 50–200 molecules per droplet for a total of ∼200,000 droplets. (2) SLR-RNA-seq performs a full-length PCR that is exponentially amplifying all molecules in a well, while spISO-seq performs a linear randomly primed amplification. (3) In both approaches, the amplified product is short-read-sequenced using barcodes that identify the compartment (well or droplet of origin) and (based on that, most of the time only one molecule per gene is observed per compartment) the molecule of origin. (4) All short reads originating from the same molecule of origin are then collectively analyzed to retrieve long-range information within molecules.
Figure 2.
Figure 2.
Exploration of low input capacities using shallow sequencing of a MiSeq. (A) Number of molecules and bases for different input amounts. (B) Percentage of reads that were uniquely mappable for all input amounts. (C) Percentage of mapped bases that fall onto annotated GENCODE exons for all input amounts. (D) Percentage of annotated introns among all spliced mappings for all input amounts. (E) Heat map of pairwise Spearman correlations of FPKMs for all input amounts. (F) Heat map of pairwise Pearson correlations of FPKMs for all input amounts.
Figure 3.
Figure 3.
Molecule and gene identification using deep sequencing. (A) Histogram of read counts for all barcodes (top left), histogram of splice gene count for all barcodes (bottom left), and dotplot of spliced genes and short reads per barcode. (B) Histogram of barcodes per gene. (C) Percentage of barcodes with a collision for each gene; genes are ordered by collision fraction (top). Gene expression as measured by barcode number for many genes without collisions (gray), many genes with few collisions (yellow), and for very few genes with many collisions (green—not observable in top plot because of very low gene number). (D) Number of spliced molecules identified depending on how many spliced short reads identify an intron of the molecule's gene. (E) Numbers of genes identified in four gene classes (protein-coding genes, lincRNA genes, antisense genes, and pseudogenes).
Figure 4.
Figure 4.
Gene quantification. (A) Dotplot of gene expression from published short-read data (Li et al. 2014) and molecules per million of spISO-seq. (B) Overlap of genes identified by spISO-seq's linked reads and SLR-RNA-seq's SLRs. (C) Dotplot of gene expression from published synthetic long-read data (Tilgner et al. 2015) and molecules per million of spISO-seq. (D) Gene length and gene type enrichments for genes found only with spISO-seq and those found with spISO-seq and SLR-RNA-seq (Tilgner et al. 2015). (E) Length for mature RNAs for four different gene classes. (F) Dotplot of Ψ-values of short-read RNA sequencing (x-axis) and of spISO-seq (y-axis).
Figure 5.
Figure 5.
Coordinated exon pairs and influences on protein–protein interactions. (A) Percentage of genes with coordination events found by SLR-RNA-seq at three different FDRs that are also found with spISO-seq at FDR of 0.05. Blue bars: all SLR-RNA-seq coordination genes; orange bars: only SLR-RNA-seq genes, in which most molecules (Methods) show only exon inclusion and exon exclusion; brown bar: only SLR-RNA-seq genes, in which most molecules (Methods) show only exon inclusion and exon exclusion and where skipping events are mappable using short reads and STAR (Dobin et al. 2013). (B) Dotplot for extent of coordination according to SLR-RNA-seq and spISO-seq for cases in which both technologies indicate coordination. (C) Percentage of genes in which coordinated exons contain noncoding sequence for genes with coordination (FDR < 0.05) and without. (D) Single gene view for the MAPT gene, the center of all tauopathies. Bottom, black track: GENCODE annotation. Middle, colored track: spISO-seq data, with each line representing one molecule. Top, red-brown track: SLR-RNA-seq data with each line representing one molecule. Blue boxes highlight the inclusion of two alternative exons, whose inclusion is anticorrelated. (E) Protein–protein interaction network for genes with splicing coordination.
Figure 6.
Figure 6.
Coordination between first alternative donors and last alternative acceptors. (A) Percent of exon pairs that are always separated by at least one intermediate exon for lincRNAs and for protein-coding genes. (B) Frequency among all coordinated pairs of pairs of internal splice sites (“Internal-internal”), pairs of an internal and a last splice site (“Internal-last”), pairs of a first splice site and an internal exon (“First-internal”), and pairs of a first and a last splice site (“First-last”). (C) Percentage of pairs of a first and a last splice site among coordinated (FDR < 0.05) and noncoordinated pairs. (D) Bottom, black track: GENCODE annotation. Middle, colored track: spISO-seq data, with each line representing one molecule. Top, red-brown track: SLR-RNA-seq data with each line representing one molecule. Blue boxes highlight first exon and TSS choice (left blue boxes) and internal exon inclusion (right blue boxes). Inclusion of the alternative internal exon occurs only when the downstream first exon/TSS is chosen.
Figure 7.
Figure 7.
Estimation of genes with coordination genome-wide. (A) Percent of genes (among genes with one tested exon pair at a given cutoff) that show at least one coordinated exon pair with P < 5 × 10−7 and absolute value log-odds-ratio of 0.5 or above. Vertical bars indicate 95% confidence intervals. (B) Same figure as A, considering only one exon pair per gene: the one with the highest number of informative reads. Vertical bars indicate 95% confidence intervals. (C) Orange arrow indicates percentage of genes with ≥25 informative reads that have a coordination event. Red arrow indicates the same percentage for genes with 500 informative reads. Blue distribution shows 50 lists of exon pairs, down-sampled from the “≥500 informative read” data to the “≥25 informative read data.”

Similar articles

Cited by

References

    1. Au KF, Underwood JG, Lee L, Wong WH. 2012. Improving PacBio long read accuracy by short read alignment. PLoS One 7: e46679. - PMC - PubMed
    1. Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, et al. 2013. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci 110: E4821–E4830. - PMC - PubMed
    1. Batut P, Dobin A, Plessy C, Carninci P, Gingeras TR. 2013. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res 23: 169–180. - PMC - PubMed
    1. Behr J, Kahles A, Zhong Y, Sreedharan VT, Drewe P, Rätsch G. 2013. MITIE: simultaneous RNA-Seq-based transcript identification and quantification in multiple samples. Bioinformatics 29: 2529–2538. - PMC - PubMed
    1. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57: 289–300.

Publication types

MeSH terms

Substances

LinkOut - more resources