Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

doi:10.1101/gr.230516.117

. 2018 Feb;28(2):231-242.

doi: 10.1101/gr.230516.117. Epub 2017 Dec 1.

Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

Hagen Tilgner^#¹, Fereshteh Jahanbani^#², Ishaan Gupta¹, Paul Collier¹, Eric Wei², Morten Rasmussen³, Michael Snyder²

Affiliations

¹ Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10021, USA.
² Department of Genetics, Stanford University, Stanford, California 94304, USA.
³ Arc Bio LLC, Menlo Park, California 94025, USA.

^# Contributed equally.

PMID: 29196558
PMCID: PMC5793787
DOI: 10.1101/gr.230516.117

Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

Hagen Tilgner et al. Genome Res. 2018 Feb.

. 2018 Feb;28(2):231-242.

doi: 10.1101/gr.230516.117. Epub 2017 Dec 1.

Authors

Hagen Tilgner^#¹, Fereshteh Jahanbani^#², Ishaan Gupta¹, Paul Collier¹, Eric Wei², Morten Rasmussen³, Michael Snyder²

Affiliations

¹ Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10021, USA.
² Department of Genetics, Stanford University, Stanford, California 94304, USA.
³ Arc Bio LLC, Menlo Park, California 94025, USA.

^# Contributed equally.

PMID: 29196558
PMCID: PMC5793787
DOI: 10.1101/gr.230516.117

Abstract

Understanding transcriptome complexity is crucial for understanding human biology and disease. Technologies such as Synthetic long-read RNA sequencing (SLR-RNA-seq) delivered 5 million isoforms and allowed assessing splicing coordination. Pacific Biosciences and Oxford Nanopore increase throughput also but require high input amounts or amplification. Our new droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k-200k partitions of 10-200 molecules at a time, enabling analysis of 10-100 million RNA molecules. SpISO-seq requires less than 1 ng of input cDNA, limiting or removing the need for prior amplification with its associated biases. Adjusting the number of reads devoted to each molecule reduces sequencing lanes and cost, with little loss in detection power. The increased number of molecules expands our understanding of isoform complexity. In addition to confirming our previously published cases of splicing coordination (e.g., BIN1), the greater depth reveals many new cases, such as MAPT Coordination of internal exons is found to be extensive among protein coding genes: 23.5%-59.3% (95% confidence interval) of highly expressed genes with distant alternative exons exhibit coordination, showcasing the need for long-read transcriptomics. However, coordination is less frequent for noncoding sequences, suggesting a larger role of splicing coordination in shaping proteins. Groups of genes with coordination are involved in protein-protein interactions with each other, raising the possibility that coordination facilitates complex formation and/or function. We also find new splicing coordination types, involving initial and terminal exons. Our results provide a more comprehensive understanding of the human transcriptome and a general, cost-effective method to analyze it.

PubMed Disclaimer

Figures

**Figure 1.**
Comparative outline of the spISO-seq and the previously published SLR-RNA-seq approach. (1) Both approaches rely on the principle of compartmentalization. The fewer cDNA molecules are separated into one compartment, the lower the probability of having two nonidentical molecules from the same gene. SLR-RNA-seq employs 1000–2000 molecules per well on 384-well plates, while spISO-seq employs 50–200 molecules per droplet for a total of ∼200,000 droplets. (2) SLR-RNA-seq performs a full-length PCR that is exponentially amplifying all molecules in a well, while spISO-seq performs a linear randomly primed amplification. (3) In both approaches, the amplified product is short-read-sequenced using barcodes that identify the compartment (well or droplet of origin) and (based on that, most of the time only one molecule per gene is observed per compartment) the molecule of origin. (4) All short reads originating from the same molecule of origin are then collectively analyzed to retrieve long-range information within molecules.

**Figure 2.**
Exploration of low input capacities using shallow sequencing of a MiSeq. (A) Number of molecules and bases for different input amounts. (B) Percentage of reads that were uniquely mappable for all input amounts. (C) Percentage of mapped bases that fall onto annotated GENCODE exons for all input amounts. (D) Percentage of annotated introns among all spliced mappings for all input amounts. (E) Heat map of pairwise Spearman correlations of FPKMs for all input amounts. (F) Heat map of pairwise Pearson correlations of FPKMs for all input amounts.

**Figure 3.**
Molecule and gene identification using deep sequencing. (A) Histogram of read counts for all barcodes (*top left*), histogram of splice gene count for all barcodes (*bottom left*), and dotplot of spliced genes and short reads per barcode. (B) Histogram of barcodes per gene. (C) Percentage of barcodes with a collision for each gene; genes are ordered by collision fraction (*top*). Gene expression as measured by barcode number for many genes without collisions (gray), many genes with few collisions (yellow), and for very few genes with many collisions (green—not observable in *top* plot because of very low gene number). (D) Number of spliced molecules identified depending on how many spliced short reads identify an intron of the molecule's gene. (E) Numbers of genes identified in four gene classes (protein-coding genes, lincRNA genes, antisense genes, and pseudogenes).

**Figure 4.**
Gene quantification. (A) Dotplot of gene expression from published short-read data (Li et al. 2014) and molecules per million of spISO-seq. (B) Overlap of genes identified by spISO-seq's linked reads and SLR-RNA-seq's SLRs. (C) Dotplot of gene expression from published synthetic long-read data (Tilgner et al. 2015) and molecules per million of spISO-seq. (D) Gene length and gene type enrichments for genes found only with spISO-seq and those found with spISO-seq and SLR-RNA-seq (Tilgner et al. 2015). (E) Length for mature RNAs for four different gene classes. (F) Dotplot of Ψ-values of short-read RNA sequencing (x-axis) and of spISO-seq (y-axis).

**Figure 5.**
Coordinated exon pairs and influences on protein–protein interactions. (A) Percentage of genes with coordination events found by SLR-RNA-seq at three different FDRs that are also found with spISO-seq at FDR of 0.05. Blue bars: all SLR-RNA-seq coordination genes; orange bars: only SLR-RNA-seq genes, in which most molecules (Methods) show only exon inclusion and exon exclusion; brown bar: only SLR-RNA-seq genes, in which most molecules (Methods) show only exon inclusion and exon exclusion and where skipping events are mappable using short reads and STAR (Dobin et al. 2013). (B) Dotplot for extent of coordination according to SLR-RNA-seq and spISO-seq for cases in which both technologies indicate coordination. (C) Percentage of genes in which coordinated exons contain noncoding sequence for genes with coordination (FDR < 0.05) and without. (D) Single gene view for the *MAPT* gene, the center of all tauopathies. *Bottom*, black track: GENCODE annotation. *Middle*, colored track: spISO-seq data, with each line representing one molecule. *Top*, red-brown track: SLR-RNA-seq data with each line representing one molecule. Blue boxes highlight the inclusion of two alternative exons, whose inclusion is anticorrelated. (E) Protein–protein interaction network for genes with splicing coordination.

**Figure 6.**
Coordination between first alternative donors and last alternative acceptors. (A) Percent of exon pairs that are always separated by at least one intermediate exon for lincRNAs and for protein-coding genes. (B) Frequency among all coordinated pairs of pairs of internal splice sites (“Internal-internal”), pairs of an internal and a last splice site (“Internal-last”), pairs of a first splice site and an internal exon (“First-internal”), and pairs of a first and a last splice site (“First-last”). (C) Percentage of pairs of a first and a last splice site among coordinated (FDR < 0.05) and noncoordinated pairs. (D) *Bottom*, black track: GENCODE annotation. *Middle*, colored track: spISO-seq data, with each line representing one molecule. *Top*, red-brown track: SLR-RNA-seq data with each line representing one molecule. Blue boxes highlight first exon and TSS choice (*left* blue boxes) and internal exon inclusion (*right* blue boxes). Inclusion of the alternative internal exon occurs only when the downstream first exon/TSS is chosen.

**Figure 7.**
Estimation of genes with coordination genome-wide. (A) Percent of genes (among genes with one tested exon pair at a given cutoff) that show at least one coordinated exon pair with P < 5 × 10⁻⁷ and absolute value log-odds-ratio of 0.5 or above. Vertical bars indicate 95% confidence intervals. (B) Same figure as A, considering only one exon pair per gene: the one with the highest number of informative reads. Vertical bars indicate 95% confidence intervals. (C) Orange arrow indicates percentage of genes with ≥25 informative reads that have a coordination event. Red arrow indicates the same percentage for genes with 500 informative reads. Blue distribution shows 50 lists of exon pairs, down-sampled from the “≥500 informative read” data to the “≥25 informative read data.”

See this image and copyright information in PMC

Cited by

Getting the Entire Message: Progress in Isoform Sequencing.
Hardwick SA, Joglekar A, Flicek P, Frankish A, Tilgner HU. Hardwick SA, et al. Front Genet. 2019 Aug 16;10:709. doi: 10.3389/fgene.2019.00709. eCollection 2019. Front Genet. 2019. PMID: 31475029 Free PMC article. Review.
acorde unravels functionally interpretable networks of isoform co-usage from single cell data.
Arzalluz-Luque A, Salguero P, Tarazona S, Conesa A. Arzalluz-Luque A, et al. Nat Commun. 2022 Apr 5;13(1):1828. doi: 10.1038/s41467-022-29497-w. Nat Commun. 2022. PMID: 35383181 Free PMC article.
Repeat-associated RNA structure and aberrant splicing.
Hale MA, Johnson NE, Berglund JA. Hale MA, et al. Biochim Biophys Acta Gene Regul Mech. 2019 Nov-Dec;1862(11-12):194405. doi: 10.1016/j.bbagrm.2019.07.006. Epub 2019 Jul 16. Biochim Biophys Acta Gene Regul Mech. 2019. PMID: 31323433 Free PMC article. Review.
Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing.
Gong B, Li D, Łabaj PP, Pan B, Novoradovskaya N, Thierry-Mieg D, Thierry-Mieg J, Chen G, Bergstrom Lucas A, LoCoco JS, Richmond TA, Tseng E, Kusko R, Happe S, Mercer TR, Pabón-Peña C, Salmans M, Tilgner HU, Xiao W, Johann DJ Jr, Jones W, Tong W, Mason CE, Kreil DP, Xu J. Gong B, et al. Sci Data. 2024 Aug 16;11(1):892. doi: 10.1038/s41597-024-03741-y. Sci Data. 2024. PMID: 39152166 Free PMC article.
Targeted transcriptome analysis using synthetic long read sequencing uncovers isoform reprograming in the progression of colon cancer.
Liu S, Wu I, Yu YP, Balamotis M, Ren B, Ben Yehezkel T, Luo JH. Liu S, et al. Commun Biol. 2021 Apr 27;4(1):506. doi: 10.1038/s42003-021-02024-1. Commun Biol. 2021. PMID: 33907296 Free PMC article.

See all "Cited by" articles

References

1. Au KF, Underwood JG, Lee L, Wong WH. 2012. Improving PacBio long read accuracy by short read alignment. PLoS One 7: e46679. - PMC - PubMed
1. Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, et al. 2013. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci 110: E4821–E4830. - PMC - PubMed
1. Batut P, Dobin A, Plessy C, Carninci P, Gingeras TR. 2013. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res 23: 169–180. - PMC - PubMed
1. Behr J, Kahles A, Zhong Y, Sreedharan VT, Drewe P, Rätsch G. 2013. MITIE: simultaneous RNA-Seq-based transcript identification and quantification in multiple samples. Bioinformatics 29: 2529–2538. - PMC - PubMed
1. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57: 289–300.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Au KF, Underwood JG, Lee L, Wong WH. 2012. Improving PacBio long read accuracy by short read alignment. PLoS One 7: e46679. - PMC - PubMed

[2] Au KF, Underwood JG, Lee L, Wong WH. 2012. Improving PacBio long read accuracy by short read alignment. PLoS One 7: e46679. - PMC - PubMed

[3] Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, et al. 2013. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci 110: E4821–E4830. - PMC - PubMed

[4] Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, et al. 2013. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci 110: E4821–E4830. - PMC - PubMed

[5] Batut P, Dobin A, Plessy C, Carninci P, Gingeras TR. 2013. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res 23: 169–180. - PMC - PubMed

[6] Batut P, Dobin A, Plessy C, Carninci P, Gingeras TR. 2013. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res 23: 169–180. - PMC - PubMed

[7] Behr J, Kahles A, Zhong Y, Sreedharan VT, Drewe P, Rätsch G. 2013. MITIE: simultaneous RNA-Seq-based transcript identification and quantification in multiple samples. Bioinformatics 29: 2529–2538. - PMC - PubMed

[8] Behr J, Kahles A, Zhong Y, Sreedharan VT, Drewe P, Rätsch G. 2013. MITIE: simultaneous RNA-Seq-based transcript identification and quantification in multiple samples. Bioinformatics 29: 2529–2538. - PMC - PubMed

[9] Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57: 289–300.

[10] Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57: 289–300.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

Affiliations

Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources