Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 23;21(1):74.
doi: 10.1186/s13059-020-01981-w.

Obstacles to detecting isoforms using full-length scRNA-seq data

Affiliations

Obstacles to detecting isoforms using full-length scRNA-seq data

Jennifer Westoby et al. Genome Biol. .

Abstract

Background: Early single-cell RNA-seq (scRNA-seq) studies suggested that it was unusual to see more than one isoform being produced from a gene in a single cell, even when multiple isoforms were detected in matched bulk RNA-seq samples. However, these studies generally did not consider the impact of dropouts or isoform quantification errors, potentially confounding the results of these analyses.

Results: In this study, we take a simulation based approach in which we explicitly account for dropouts and isoform quantification errors. We use our simulations to ask to what extent it is possible to study alternative splicing using scRNA-seq. Additionally, we ask what limitations must be overcome to make splicing analysis feasible. We find that the high rate of dropouts associated with scRNA-seq is a major obstacle to studying alternative splicing. In mice and other well-established model organisms, the relatively low rate of isoform quantification errors poses a lesser obstacle to splicing analysis. We find that different models of isoform choice meaningfully change our simulation results.

Conclusions: To accurately study alternative splicing with single-cell RNA-seq, a better understanding of isoform choice and the errors associated with scRNA-seq is required. An increase in the capture efficiency of scRNA-seq would also be beneficial. Until some or all of the above are achieved, we do not recommend attempting to resolve isoforms in individual cells using scRNA-seq.

Keywords: Alternative splicing; Dropouts; Gene; Isoform; Isoform choice; Single cell; scRNA-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Schematic of our simulation approach
Fig. 2
Fig. 2
The effect of sequencing depth on isoform detection. a Distributions of the mean number of isoforms detected per gene per cell for H1 hESCs whose cDNA was split and sequenced at approximately 1 million reads per cell or 4 million reads per cell on average. b Distributions of the overlap fraction. Black vertical lines represent the mean value of the distributions
Fig. 3
Fig. 3
The impact of dropouts on isoform detection. a The distribution of the probabilities of dropouts (p(dropout)) in each group of H1 hESCs and an approximation of these distributions using a beta distribution. At 1 million reads per cell, α = 1.31 and β = 0.74 in the approximated beta distribution. At 4 million reads per cell, α = 0.72 and β = 1.03 in the approximated beta distribution. b Five beta distributions from which dropout probabilities were sampled from the simulations used to generate c and d. In c, the distribution of the mean number of isoforms detected per gene per cell is shown for simulations in which one isoform was produced per gene per cell. Each plot corresponds to a simulation in which dropout probabilities were sampled from one of the distributions shown in b. d The overlap fraction for each simulation. Plots shown in c and d are for H1 hESCs sequenced at 4 million reads per cell. Black vertical lines represent the mean value of the distributions
Fig. 4
Fig. 4
The impact of quantification errors on isoform detection. a Distributions of the mean number of isoforms detected per gene per cell when one isoform is expressed per gene per cell. The probability of false positives (pFP) increases from left to right, and the probability of false negatives (pFN) increases from top to bottom. The dataset shown is H1 hESCs whose cDNA was split and sequenced at approximately 4 million reads per cell on average. b Summary plots of the average of the mean number of isoforms detected per gene per cell when pFP, pFN, or pFP and pFN are increased
Fig. 5
Fig. 5
Different models of isoform choice alter our ability to detect isoforms. a Distributions of the mean number of isoforms detected per gene per cell for H1 hESCs sequenced at approximately 4 million reads per cell using the Weibull model of isoform choice. b The same distributions when the random model is used. c The distributions when the inferred probabilities model is used. d The distributions when the cell variability model is used. See the main text for a detailed description of each model
Fig. 6
Fig. 6
Some models of isoform choice are more plausible than others. a We model the probability of picking any given isoform as a normal distribution, a Bernoulli distribution and a constant probability, all with the same mean (0.25) (top row of graphs). In the bottom row, we show the distributions of the mean number of isoforms per gene per cell detected when each model of isoform choice is used. b Histograms of mean isoform expression, ordered by isoform rank. c Histograms of dropout probability, ordered by isoform rank. All plots shown are for H1 hESCs sequenced at 1 million reads per cell
Fig. 7
Fig. 7
Mixture models. a, b Distributions of detected isoforms per gene per cell (blue) and log normal fitted distributions (orange) for H1 cells sequenced at 1 million reads per cell (a) or 4 million reads per cell (b) under the Weibull model. c, d Mixing fractions vs iterations of expectation maximisation for 1 million reads per cell (c) and 4 million reads per cell (d). Each coloured line represents the distributions for one, two, three or four isoforms being simulated as expressed per gene per cell. Equivalent plots for other isoform choice models and H9 cells can be found in Additional file 1: Figs. S25–31

References

    1. Finotello F, Di Camillo B. Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis. Brief Funct Genomics. 2015;14(2):130–42. - PubMed
    1. Zhang C, Zhang B, Lin L-L, Zhao S. Evaluation and comparison of computational tools for RNA-seq isoform quantification. BMC Genomics. 2017;18(1):583. - PMC - PubMed
    1. Ciolli Mattioli C, Rom A, Franke V, Imami K, Arrey G, Terne M, Woehler A, Akalin A, Ulitsky I, Chekulaeva M. Alternative 3’ UTRs direct localization of functionally diverse protein isoforms in neuronal compartments. Nucleic Acids Res. 2019;47(5):2560–73. - PMC - PubMed
    1. Velten L, Anders S, Pekowska A, Järvelin AI, Huber W, Pelechano V, Steinmetz LM. Single-cell polyadenylation site mapping reveals 3’ isoform choice variability. Mol Syst Biol. 2015;11(6):812. - PMC - PubMed
    1. Chen J, McSwiggen D, Ünal E. Single molecule fluorescence in situ hybridization (smFISH) analysis in budding yeast vegetative growth and meiosis. J Visualized Exp. 2018; 135. 10.3791/57774. Accessed 15 Aug 2019. - PMC - PubMed

Publication types

LinkOut - more resources