Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Aug 10;19(1):110.
doi: 10.1186/s13059-018-1496-z.

Single-cell RNAseq for the study of isoforms-how is that possible?

Affiliations
Review

Single-cell RNAseq for the study of isoforms-how is that possible?

Ángeles Arzalluz-Luque et al. Genome Biol. .

Abstract

Single-cell RNAseq and alternative splicing studies have recently become two of the most prominent applications of RNAseq. However, the combination of both is still challenging, and few research efforts have been dedicated to the intersection between them. Cell-level insight on isoform expression is required to fully understand the biology of alternative splicing, but it is still an open question to what extent isoform expression analysis at the single-cell level is actually feasible. Here, we establish a set of four conditions that are required for a successful single-cell-level isoform study and evaluate how these conditions are met by these technologies in published research.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Single-cell mRNA sequencing methods and sources of mRNA variation. a Methodological approaches to single-cell isoform studies. The combination of library preparation and sequencing technologies yields three distinct methods to capture isoform diversity. UMI-based methods are limited to sequencing of the 3′ (or 5′ end), which enables usage of UMIs to capture efficiently PCR bias in addition to early cell barcoding, even if they are particularly suited to quantify expression at the gene level. Smart-based methods produce short reads across the entire transcript length, although they require late cell barcoding (barcodes inserted in tagmentation), cannot accommodate UMIs, and the reads might be difficult to assign unambiguously to an isoform. Single-molecule sequencing allows sequencing of each transcript molecule in a single read and provides full isoform connectivity, although it suffers from a high prevalence of sequencing errors. b Sources of transcript variation that yield alternative isoforms and their position along the transcript. When compared with a reference isoform (for convenience, that including all exons, no introns and the complete UTRs), alternative TSSs (transcription start sites) and TTSs (transcription termination sites) are generated during the transcription process by shortening of the UTRs. Processing of the pre-mRNA eliminates or retains introns and exons, adding variability to the isoforms that can be generated from the gene. In addition, more than one event can simultaneously be present in the same isoform, and consequently isoform diversity will increase with the number of possible combinations of AS events. Alt. alternative, RT reverse transcription, UMI unique molecular identifier
Fig. 2
Fig. 2
Summary of limitations of the four ideal conditions for successful studies of single-cell RNAseq isoforms. From left to right, the importance and current limitations of full-length transcript sequencing, capture efficiency and sequencing depth, the number of cells sequenced, and sequencing errors and artefacts for isoform detection are presented in the diagram. Each is discussed in the main text. Alt. alternative, RT reverse transcription, UMI unique molecular identifier
Fig. 3
Fig. 3
Qualitative performance comparison of the three main single-cell RNAseq methods for isoform detection. From the inside to the outside of the graph, the three dotted lines represent ‘low’, ‘medium’ and ‘high’ levels of each characteristic. The most prominent features of long reads (red) are high isoform resolution potential but a high occurrence of errors. Smart-based methods (yellow) provide high sequencing depth and medium isoform resolution power and number of cells. UMI-based methods (blue) can process high numbers of cells with medium to low sequencing depth and accurately quantify isoform expression, although their isoform resolution potential is strongly limited. UMI unique molecular identifier
Fig. 4
Fig. 4
Simulation of short- and long-read workflows and the modelling of a UMI-based library preparation strategy. a Short-read simulation workflow. Transcript sequences from the Tardaguila et al. 2018 neural transcriptome [66] were trimmed, and reads simulated from fragments to recreate UMI library preparation limitations in transcript covered length. Full-length reads were also simulated. Reads were aligned to the mouse genome using STAR and isoform expression quantified using RSEM. For UMI simulations, the number of isoforms resolved using Smart-seq reads was used as the 100% reference to calculate the percentage of resolution of MIG. For the Smart-seq simulation, the annotated number of isoforms per gene (in Tardaguila et al. [66]) was used as the 100% reference. b Long-read simulation workflow. The Illumina quantification of isoform expression available in Tardaguila et al. [66] was scaled to one million reads (TPM) to recreate a Sequel run of one million long reads, where a single cell is sequenced. Values were downsampled to simulate scenarios where an increasing number of cells (2, 6, 10, 16, 20) are sequenced together in a similar run. The number of reads per cell is therefore gradually decreasing. The number of MIGs in the Tardaguila et al. annotation was compared with the number of MIGs detected in the simulated scenarios. Then, the number of isoform switches detected in the Tardaguila et al. data was compared. c Short-read length simulated for each simulation scenario (represented for 3′ UMIs only). PacBio transcript sequences in the Tardaguila et al. dataset [66] were trimmed as described. To ensure that coverage was even when capturing growing lengths of the transcripts in simulated UMI-based protocols, the length of the simulated reads was increased for longer fragments (100 and 200 bp—25 bp reads, 300 and 500 bp—50 bp reads, 1000 bp—100 bp reads, full length—250 bp reads, paired-end). MIG multi-isoform gene, NSC neural stem cell, RSEM RNA-seq by expectation maximization, TPM transcripts per million, UMI unique molecular identifier
Fig. 5
Fig. 5
Simulation results. a Short-read simulations—proportion of transcript length left uncovered as longer fragments are simulated in a UMI-based library preparation scenario. Short fragments (100–200 bp) leave most of the transcript uncovered by the reads (> 0.75 proportion), while the simulation of longer (> 300 bp) fragments affects transcripts differently depending on their length, hence the growing distributions in the boxplot. b Short-read simulations—multi-isoform genes (MIGs) detected in each 3′ and 5′ end as well as in the Smart-seq simulation are classified in four intervals, according to their individual percentage of resolution. Results shown for neural stem cells (NSCs) only. Intervals gather MIGs for which 0–25, 25–50, 50–75 and 75–100% of their isoforms are resolved. The 3′ end and 5′ end labels only refer to unique molecular identifier (UMI) simulations. Note that Smart-seq data have been plotted twice, in both the 3′ end and 5′ end bar-graph rows, for completeness and to ease visual comparison. c Long-read simulations—the number of genes for multi-isoform genes detected as sequencing depth per cell is progressively lost. The dashed line indicates the number of multi-isoform genes present in the original neural cell transcriptome. A decrease in depth per cell decreases the number of genes for which more than one isoform can be observed. d Long-read simulations—the number of isoform switches detected between neural stem cells and oligodendrocytes in a similar scenario, assuming half of the cells belong to each cell type (i.e. two cells equate to one oligodendrocyte and one NSC). A decrease in sequencing depth per cell not only prevents detection of isoform ratio expression changes (which constitute the majority differences in isoform expression), but also reduces the number of isoform switches that can be observed. The dashed line indicates the number of NSCs versus oligodendrocyte isoform switches detected in the original transcript expression data

References

    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008;322:1845–1848. doi: 10.1126/science.1162228. - DOI - PMC - PubMed
    1. Batut P, Gingeras TR. RAMPAGE: Promoter activity profiling by paired-end sequencing of 5′-complete cDNAs. Curr Protoc Mol Biol. 2013;104:25B.11.1-25B.11.16. 10.1002/0471142727.mb25b11s104. - PMC - PubMed
    1. Pelechano V, Wei W, Jakob P, Steinmetz LM. Genome-wide identification of transcript start and end sites by transcript isoform sequencing. Nat Protoc. 2014;9:1740–1759. doi: 10.1038/nprot.2014.121. - DOI - PMC - PubMed
    1. Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324:218–223. doi: 10.1126/science.1168978. - DOI - PMC - PubMed

Publication types

LinkOut - more resources