. 2022 Dec;40(12):1780-1793.

doi: 10.1038/s41587-022-01361-8. Epub 2022 Jun 27.

High-throughput total RNA sequencing in single cells using VASA-seq

Fredrik Salmen^#^{1

2}, Joachim De Jonghe^#^{3

4}, Tomasz S Kaminski^{3

5}, Anna Alemany^{1

2}, Guillermo E Parada⁶, Joe Verity-Legg^{1

2}, Ayaka Yanagida⁷, Timo N Kohler^{3

8}, Nicholas Battich^{1

2}, Floris van den Brekel^{1

2}, Anna L Ellermann³, Alfonso Martinez Arias⁹, Jennifer Nichols^{8

10}, Martin Hemberg^{6

11}, Florian Hollfelder¹², Alexander van Oudenaarden^{13

14}

Affiliations

¹ Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences) and University Medical Center, Utrecht, Netherlands.
² Oncode Institute, Utrecht, Netherlands.
³ Department of Biochemistry, University of Cambridge, Cambridge, UK.
⁴ Francis Crick Institute, London, UK.
⁵ Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw, Poland.
⁶ Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
⁷ Division of Stem Cell Therapy, Center for Stem Cell Biology and Regenerative Medicine, Institute of Medical Science, University of Tokyo, Tokyo, Japan.
⁸ Wellcome Trust - Medical Research Council Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Cambridge, UK.
⁹ Systems Bioengineering, DCEXS, Universidad Pompeu Fabra, Doctor Aiguader 88 ICREA (Institució Catalana de Recerca i Estudis Avançats), Barcelona, Spain.
¹⁰ Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
¹¹ Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA, USA.
¹² Department of Biochemistry, University of Cambridge, Cambridge, UK. fh111@cam.ac.uk.
¹³ Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences) and University Medical Center, Utrecht, Netherlands. a.vanoudenaarden@hubrecht.eu.
¹⁴ Oncode Institute, Utrecht, Netherlands. a.vanoudenaarden@hubrecht.eu.

^# Contributed equally.

PMID: 35760914
PMCID: PMC9750877
DOI: 10.1038/s41587-022-01361-8

High-throughput total RNA sequencing in single cells using VASA-seq

Fredrik Salmen et al. Nat Biotechnol. 2022 Dec.

. 2022 Dec;40(12):1780-1793.

doi: 10.1038/s41587-022-01361-8. Epub 2022 Jun 27.

Authors

Affiliations

¹ Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences) and University Medical Center, Utrecht, Netherlands.
² Oncode Institute, Utrecht, Netherlands.
³ Department of Biochemistry, University of Cambridge, Cambridge, UK.
⁴ Francis Crick Institute, London, UK.
⁵ Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw, Poland.
⁶ Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
⁷ Division of Stem Cell Therapy, Center for Stem Cell Biology and Regenerative Medicine, Institute of Medical Science, University of Tokyo, Tokyo, Japan.
⁸ Wellcome Trust - Medical Research Council Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Cambridge, UK.
⁹ Systems Bioengineering, DCEXS, Universidad Pompeu Fabra, Doctor Aiguader 88 ICREA (Institució Catalana de Recerca i Estudis Avançats), Barcelona, Spain.
¹⁰ Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
¹¹ Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA, USA.
¹² Department of Biochemistry, University of Cambridge, Cambridge, UK. fh111@cam.ac.uk.
¹³ Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences) and University Medical Center, Utrecht, Netherlands. a.vanoudenaarden@hubrecht.eu.
¹⁴ Oncode Institute, Utrecht, Netherlands. a.vanoudenaarden@hubrecht.eu.

^# Contributed equally.

PMID: 35760914
PMCID: PMC9750877
DOI: 10.1038/s41587-022-01361-8

Abstract

Most methods for single-cell transcriptome sequencing amplify the termini of polyadenylated transcripts, capturing only a small fraction of the total cellular transcriptome. This precludes the detection of many long non-coding, short non-coding and non-polyadenylated protein-coding transcripts and hinders alternative splicing analysis. We, therefore, developed VASA-seq to detect the total transcriptome in single cells, which is enabled by fragmenting and tailing all RNA molecules subsequent to cell lysis. The method is compatible with both plate-based formats and droplet microfluidics. We applied VASA-seq to more than 30,000 single cells in the developing mouse embryo during gastrulation and early organogenesis. Analyzing the dynamics of the total single-cell transcriptome, we discovered cell type markers, many based on non-coding RNA, and performed in vivo cell cycle analysis via detection of non-polyadenylated histone genes. RNA velocity characterization was improved, accurately retracing blood maturation trajectories. Moreover, our VASA-seq data provide a comprehensive analysis of alternative splicing during mammalian development, which highlighted substantial rearrangements during blood development and heart morphogenesis.

PubMed Disclaimer

Conflict of interest statement

F.S., A.v.O., J.D.J., T.S.K. and F.H. are inventors on patent applications submitted by the Stichting Oncode Institute on behalf of Koninklijke Nederlandse Akademie Van Wetenschappen and the University of Cambridge (via its technology transfer office, Cambridge Enterprise). A.v.O. is a member of the advisory board of Single-Cell Discoveries.

Figures

**Fig. 1. Overview of the VASA-seq workflow and benchmarking against other state-of-the-art methodologies.**
a, Overview of the VASA-seq single-cell molecular workflow. Single cells are lysed, and RNA is fragmented. Fragments are repaired and polyadenylated, followed by reverse transcription (RT) using barcoded oligo-dT primers. The cDNA is made double stranded and amplified using IVT. aRNA is depleted of rRNA, and libraries are finalized by ligation, RT and PCR, which leave fragments ready for sequencing. b, Picture illustrating the single-cell encapsulation process using droplet microfluidics. The single cells (green) are co-encapsulated with a barcoded bead (purple), lysis and fragmentation mix (blue), and compartmentalization is achieved with the addition of fluorinated surfactant oil (red) at the flow-focusing junction. c, Picture illustrating the picoinjection of reagents (green) to single-cell lysates (light blue/purple). The droplet surface tension is perturbed using an electric field that allows for the subsequent additions of end repair/poly(A) and RT mix. d, Cross-contamination test for VASA-drop was carried out using HEK293T cells (human) and mouse embryonic stem cells (mouse). Barcodes with more than 25% of detected UFIs belonging to the other species were considered doublets/mixed (red). Detected barcodes with low UFIs (<7,500) were discarded (gray). The remainder were assigned to either human (magenta) or mouse (blue). e, Gene body coverage comparison along protein-coding genes. VASA-seq showed even coverage, whereas 10x, Smart-seq-total and Smart-seq3 had a bias toward transcript termini (3′ or 5′ and 3′, respectively). f, The number of detected annotated genes in HEK293T cells, for each method, is plotted against the number of reads (after quality filtering, adapter removal and homopolymer trimming) per cell across different downsampling thresholds. The saturation curves showed that VASA-seq was the most sensitive of the methods. Curvature of gene detection indicated that full complexity was not reached for the method when 75,000 reads were allocated to each cell. Only cells that were sequenced to at least 75,000 reads were used (VASA-plate: n = 174, VASA-drop: n = 376, Smart-seq3: n = 113, Smart-seq-total = 260, 10x Chromium: n = 288).

**Fig. 2. VASA-seq enables novel marker gene detection in the developing mouse embryo.**
a, Schematic figure of mouse embryo morphology at developmental stages E6.5, E7.5, E8.5 and E9.5 (left to right). b, Fraction of transcripts per biotype in VASA-seq compared to 10x Chromium for mouse embryos at each timepoint using the 20% terminal portion of genes. The comparison includes protein-coding genes (top-left panel), lncRNAs (top-right panel), TFs (bottom-left panel) and sncRNAs (bottom-right panel). c, Percentage of genes detected in VASA-seq compared to 10x for each timepoint using the 20% terminal portion of genes. 70.8–76.2% of the detected genes were shared between the methods; 18.7–25.3% were detected only in VASA-seq; and 2.4–5.1% were detected only in 10x. d, Strategy to transfer cluster identity from 10x Chromium or VASA-seq (reference technology) to VASA-seq or 10x Chromium (target technology) at the single-cell level. First, for a given cluster in the reference technology, a background histogram of the distances between cells in that cluster and their corresponding first nearest neighbor in the target technology is obtained (gray arrows and gray histogram). Next, each cell in the target technology is assigned to the cluster of its nearest neighbor cell in the reference technology (black and green arrows) with a score equal to the area under the left curve resulting from the intersection between the cell–cell distance and the corresponding background histogram (dashed area). This procedure is then repeated for all clusters in the reference technology. e, UMAP of E8.5 mouse embryo cells from 10x Chromium (n = 9,358) and VASA-seq (n = 7,899) that were part of equivalent clusters using the 20% terminal portion of genes. Clusters that are detected in both technologies are marked with numbers 1–43, and each cluster is colored according to the cell type category: green, blood; blue, ectoderm; purple, endoderm; orange, mesoderm; gray, epiblast. Gray fill in cluster label indicates extra-embryonic contribution; black fill indicates embryonic contribution. f, Scatter plot showing the number of differentially expressed genes per cluster in VASA-seq (x axis) versus 10x Chromium (y axis) for spliced protein-coding genes (left panel), unspliced protein-coding genes (middle panel) and lncRNAs (right panel) using the 20% terminal portion of genes. Numbers indicate clusters where a higher number of marker genes were detected in 10x Chromium. Clusters are colored according to the cell type category: green, blood; blue, ectoderm; purple, endoderm; orange, mesoderm, gray, epiblast. g, Heat maps showing the ratio of differential upregulated genes (log₂ fold change >2 and P < 0.01), per cluster, between VASA-seq and 10x Chromium using the 20% terminal portion of genes. Columns display spliced protein-coding genes (left panel), unspliced protein-coding genes (middle panel) and lncRNAs (right panel), and rows are clusters. Red color indicates when marker genes are more predominantly detected in VASA-seq; blue color indicates when higher numbers of marker genes are detected in 10x Chromium. The statistical test used was a two-sided t-test, and P values were uncorrected for multiple comparisons. h, Examples of newly detected unspliced lncRNA marker genes in VASA-seq for E8.5: *Foxl2os* in paraxial mesoderm progenitors (left panel), *AI115009* in mesenchyme (middle panel) and *C130021I20Rik* in forebrain/midbrain/hindbrain and surface ectoderm (right panel).

**Fig. 3. Histone gene expression robustly identifies cycling cells.**
a, Volcano plot showing differentially expressed genes between VASA-seq (right, positive values) and 10x Chromium (left, negative values). Genes that are always highly differentially expressed across timepoints and have a log₂ fold change >4 and P < 0.001 are colored; purple color indicates non-polyadenylated, and orange color indicates polyadenylated genes. Many of the differentially expressed genes enriched in the VASA-seq dataset are histone genes. The statistical test used was a two-sided t-test, using uncorrected P values for multiple comparisons. b, Histogram showing the distribution of histone gene expression in VASA-seq compared to 10x Chromium. The overlayed dashed black line shows a bimodal Weibull distribution, and the dashed red line shows a single Weibull distribution. c, Histogram showing the distribution of histone gene expression in VASA-seq labeled with the estimated cell cycle phase using the expression of S and G2M genes for scoring. Detected histone expression in S-phase does not correlate with predictive cell cycle estimation. d, Cells are identified as cycling/S-phase (blue) and non-cycling (yellow) based on the total histone gene expression shown in Fig. 3b. UMAP of the VASA-seq embryonic atlas before (left panel) and after (right panel) removal of cell cycle genes. e, Cell type annotated UMAP of the aggregated VASA-seq dataset after removal of cell cycle genes. Each color and number represent a cell type, called manually based on marker gene expression for each Leiden cluster. Smaller panels (right) highlight cells sampled at each timepoint (E6.5, E7.5, E8.5 and E9.5) in black. In total, 40 different cell types were identified: 1-erythropoiesis (expansive, S-phase), 2-somites, 3-paraxial mesoderm, 4-intermediate mesoderm I, 5-caudal epiblast, 6-lateral plate mesoderm/intermediate mesoderm primordium, 7-spinal cord (differentiated neurons), 8-endothelium, 9-preplacodal/placodal region, 10-rhombomeres (hindbrain), 11-forebrain/hindbrain (isthmus), 12-epiblast (E7.5), 13-forebrain, 14-spinal cord (differentiated neurons), 15-neural crest, 16-allantois, 17-cranial mesoderm, 18-lateral plate mesoderm, 19-early caudal epiblast, 20-trophectoderm, 21-dorsal surface ectoderm, 22-anterior neural crest, 23-pharyngeal arches, 24-primitive erythroid progenitors, 25-caudal epiblast (E7.5), 26-endoderm, 27-visceral endoderm, 28-first heart field, 29-myofibroblasts, 30-epiblast (E6.5), 31-spinal cord (cycling progenitors), 32-pharyngeal arches, 33-primitive heart tube, 34-outflow tract, 35-secondary heart field, 36-intermediate mesoderm I, 37-parietal endoderm, 38-pro-nephros, 39-mesodermal unknown and 40-node. f, Percentage of cycling/S-phase cells per cell type. Average number of cycling cells is 65% (black line) ± 11% (red dashed lines) across all cell types. Late primitive erythrocytes (green) diverge from the average by having 84% of the cells in S-phase. Node cells (brown) and primitive heart tube (pink) have much fewer cells in S-phase—20% and 30%, respectively. g, Plots showing the percentage of cells in S-phase per cell type that spans over three timepoints (E6.5–E8.5, left panel; E7.5–E9.5, right panel). Trophectoderm (light brown) had an unchanged pattern, whereas endothelium (green), allantois (pink), lateral plate mesoderm (blue), endoderm (light green), visceral endoderm (light blue) and outflow tract (dark pink) all had a decreasing fraction of cycling cells as time passes. Allantois has the biggest difference, with 38% cycling in E9.5 compared to 79% in E7.5. The points are the mean and standard error of the mean obtained by bootstrapping the percentage of cells in S-phase for each equivalent cluster and biotype 1,000 times. The number of cells were: n = 140, 32 for cluster 20 and 27 at timepoint E6.5, respectively; n = 105, 340, 314, 392, 156, 171 and 69 for clusters 8, 16, 18, 20, 26, 27 and 34 at E7.5, respectively; n = 810, 552, 331, 117, 284, 339 and 121 for clusters 8, 16, 18, 20, 26, 27 and 34 at E8.5, respectively; and n = 345, 78, 30, 117 and 71 for clusters 8, 16, 18, 26 and 34 at E9.5, respectively. h, Heatmap showing differentially expressed single annotated histone genes. Rows display genes, and columns display cell types. Cell type categories/germ layers can be identified by color above the heat map. i, Example of marker histone gene expression plotted on the UMAP; red represents high expression, and blue represents low expression. *H2bc15* is highly expressed in most cell types but absent in certain cell types. *H2bc1* is solely expressed in the early epiblast (E6.5, cell type 30), whereas *H2bu2* is specific to the ectoderm germ layer and epiblasts (cell types 12 and 30).

**Fig. 4. Increased intronic capture with VASA-seq improves RNA velocity measurements.**
a, UMAP of all four timepoints for VASA-seq (E6.5–E9.5). Velocity is shown as arrows and each timepoint as a separate color. The black arrow indicates blood maturation trajectory. b, Violin plot of confidence values for VASA-seq in green (left panel) and 10x Chromium in dark purple (right panel). Only the equivalent E6.5, E7.5 and E8.5 timepoints are included in the comparison. Average RNA velocity confidence was 0.84 ± 0.12 (s.d.) for VASA-seq and 0.65 ± 0.12 (s.d.) for 10x. n number of cells were 21,497 (VASA-seq) and 16,945 (10x Genomics Chromium). Data in the box plot represent the 25%, median (center) and 75% percentiles with minimum and maximum values. c, Venn diagram showing the significant genes, according to the scVelo package, for VASA-seq and 10x Chromium. In all, we found 1,492 genes that were significant in both datasets, 1,069 that were significant only in VASA-seq and 26 that were significant only in 10x. d, Histograms showing goodness of fit (r²) for the 1,492 genes that were significant in both VASA-seq and 10x Chromium. Average values were 0.74 ± 0.18 (s.d.) for VASA-seq and 0.38 ± 0.25 (s.d.) for 10x Chromium. e, UMAP of the equivalent 10x Chromium dataset (E6.5, E7.5 and E8.5) after filtering. Velocity is shown as arrows and each timepoint as a separate color. The black arrow indicates blood maturation trajectory. f, Predicted latent time projected on the blood and erythroid progenitor subsets, showing incorrect temporal prediction of blood maturation in the 10x dataset but not in the VASA-seq dataset. g, Histogram across latent time labeled with developmental timing of the embryos using the 10x Chromium dataset, showing incorrect temporal prediction of blood development using RNA velocity computation in dynamic mode. h, Histogram across latent time labeled with developmental time of the sampled embryos in the VASA-seq dataset, showing accurate prediction of blood development progression via RNA velocity computation in dynamic mode.

**Fig. 5. Landscape of AS events across cell types involved in gastrulation and early organogenesis.**
a, Schematic representation of the different types of AS nodes categorized using Whippet. Colors highlight the exonic parts that correspond to the indicated splicing nodes. Arrows indicate different possible splice junctions that can be connected to the nodes. Red arrows represent inclusion events; black arrows represent exclusion events. b, PAGA projecting cell types onto a force field revealing connectivity between cell clusters. Edge width represents the connectivity values between cell clusters. Selected cell types are highlighted by color. c, Pairwise comparisons that exhibited the highest number of DISNs are represented by a dot plot, where the position of each point on the x axis indicates the cell clusters that were involved in each comparison (P1–P15) across the y axis. Color codes correspond to clusters in b. d, Number of DISNs found for each pairwise comparison (P1–P15). Color codes of the stacked bar plot indicate the abundance of different classes of splicing nodes that were found to be differentially included across the comparisons. e, Upset plot showing set interactions across the group of DISNs found for each pairwise comparison. Bar plot indicates the size of each intersection, and their composition is described by the bottom panel. f, Splice node makers identified for each cell cluster. The left panel shows the number of SNMs found for each cell cluster. The direction of the stacked bars (left or right) indicates if these markers were found with positive or negative Z-score values, respectively. The right panel corresponds to heat maps displaying ψ values per cell type (blue to red scale) for different SNMs. Heat map on the left represents inclusion SNMs, and the heat map on the right represents exclusion SNMs.

**Fig. 6. AS patterns across heart morphogenesis and blood formation.**
a, Schematic of mouse heart development. b, Volcano plot illustrating the DISNs detected between the PHT (positive ΔΨ values) and the FHF (negative ΔΨ values). c, Sashimi plot of *Tpm1* showing a coordinated AS switch from smooth muscle to striated muscle conformation after heart development from ECE to PHT. Box annotation on top illustrates exon order for *Tpm1*. Color-coding indicates the splicing node. Line in the bottom single-cell Ψ plots across cell types delineates the mean Ψ value, and the shading indicates the 95% confidence interval (CI). d, Single-cell gene expression UMAP plot for *Tpm1* (top left) and single-cell Ψ projection for *Tpm1_14, Tpm1_22* and *Tpm1_25* across the global splicing analysis. Each node is color-coded and highlighted in a single-cell line plot representing single-cell ψ values across all three clusters. e, Schematic representation of murine blood development throughout the profiled timepoints. f, Volcano plot illustrating the DISNs detected in the pairwise comparison between erythrocytes at E7.5 and E9.5. Color-coding indicates proteins with calmodulin-binding and/or spectrin-binding domains or RNA splicing proteins as determined by GO analysis. NS annotation stands for non-significant (gray color). g, Sashimi, domain annotation and line plots representing the skipping of exon 16 (*Epb41_30)* between E7.5 and E9.5. Line in the bottom single-cell Ψ plots across timepoints delineates the mean Ψ value at each timepoint, and the shading indicates the 95% CI. h, Sashimi, domain annotation and line plots representing the inclusion of *Add1_37* leading to a premature stop codon inclusion at E9.5 removing the C-terminus calmodulin-binding domain. Line in the bottom single-cell Ψ plots across timepoints delineates the mean Ψ value at each timepoint, and the shading indicates the 95% CI. i, Sashimi, domain annotation and line plots representing the gradual exclusion of the *Ank1_43* microexon in a disordered domain. Line in the bottom single-cell Ψ plots across timepoints delineates the mean Ψ value at each timepoint, and the shading indicates the 95% CI. j, Sashimi, domain annotation and line plots representing the gradual exclusion of the *Mbnl1_37* nuclear localization signal mediating the protein’s intracellular localization across timepoints. Line in the bottom single-cell Ψ plots across timepoints delineates the mean Ψ value at each timepoint, and the shading indicates the 95% CI.

**Extended Data Fig. 1. Overview of the sequencing and droplet microfluidic process and benchmarking analysis.**
a, The two platforms for VASA-seq, using a microfluidic device (left, VASA-drop) and a plate dispenser (right, VASA-plate). The microfluidic device allows the generation of single-cell libraries from thousands of cells, while the plate-based approach is better for rare cell types where a prior sorting is required. Each library contains the transcriptome from a large mix of cells, which are demultiplexed based on their barcode and index sequences. b, Library barcoding design for the VASA-drop workflow. Mouse embryo libraries were sequenced with the Illumina NovaSeq platform. To avoid index hopping, a custom dual indexing strategy was used. For the index i7 read, which usually only contains barcode 1 (inDrop v3), we inserted a 8-bp second index directly after a 15 bp common sequence. Only reads that had the correct combination of i5 and i7 index were further used for downstream processing. c, Design of the device used for barcoded bead and single-cell co-encapsulation. 1) input channel for the lysis and fragmentation mix, 2) input channel for the cell loading, 3) input channel for the barcoded compressible bead loading, 4) input channel for the fluorinated oil with admixed surfactant, 5) droplet exit channel. d, Design of the first picoinjector device, to inject the repair and poly(A) polymerase. 1) input channel for droplet reinjection, 2) emulsion diluting oil inlet, 3) droplet spacing oil inlet, 4) inlet for the repair and poly(A) polymerase to be picoinjected, 5) droplet exit channel, 6) positive electrode (red), 7) negative (moat) electrode (black). e, Design of the second picoinjector device, to inject the RT enzyme mix. 1) input channel for droplet re-injection, 2) emulsion diluting oil inlet, 3) droplet spacing oil inlet, 4) inlet for the RT mix to be picoinjected, 5) droplet exit channel, 6) positive electrode (red), 7) negative (moat) electrode (black). f, Photography of the container tip used for droplet collection and reinjection in the picoinjector devices. The tip is connected to a glass syringe which enables aspiration and delivery of emulsions. The tip can be connected to a PDMS plug to close the system for incubation in the water bath. g, Photography of the encapsulation process. The container tip collecting the emulsions is plugged into the outlet of the encapsulation device, while a tip containing cells is plugged in one of the inputs to deliver single-cells in the droplets. h, Photography of the bead barcode photocleavage set-up. The container tip is surrounded by aluminium foil to reflect the UV light and is kept on a container filled with ice.

**Extended Data Fig. 2. Comparison to other methods.**
a, Species mixing histogram plotted as a percentage of UFIs quantified from mapping events to a human reference genome, divided by the sum of UFIs quantified from mapping events to mouse and human reference genomes. b, Proportion of mapped reads to all annotated genes for each biotype using HEK293T cells across all methods. VASA-seq detected proportionally larger amounts of lncRNA genes (light blue) compared to the other technologies. The proportion of detected sncRNAs in VASA-seq methods was higher than 10x Chromium and Smart-seq3, but lower than with Smart-seq-total (grey). c, Proportion of sncRNA biotypes captured for HEK293T cells across methods for reads mapped to all annotated genes. Only VASA-seq and Smart-seq-total detected a significant proportion of sncRNAs biotypes, with Smart-seq-total providing the best performance in terms of relative distribution of biotypes, followed by VASA-drop and VASA-plate. MiscRNA (brown), snoRNA (pink), Ribozyme(grey-green) and snRNA (red) took up the largest proportion of measured biotypes. d, The number of detected protein coding genes in HEK293T, for each method, is plotted against the number of reads (after quality filtering, adapter removal and homopolymer trimming), per cell across different downsampling thresholds. The saturation curves showed that VASA-seq was the most sensitive of the methods. Curvature of gene detection indicated that full complexity was not reached for the method when 75,000 reads were allocated to each cell. Only cells that were sequenced to at least 75,000 reads were used. e, Number of detected genes per cell for Smart-seq3 (red), Smart-seq-total (black) and VASA-plate (blue) when sequenced at a depth of approximately 750,000 reads per cell. Data in boxplot represent the 25%, median (centre) and 75% percentiles with minimum and maximum values. The number of cells sampled were n = 113 (Smart-seq3), 260 (Smart-seq-total) and 68 (VASA-plate). f, Percentage of sequenced reads with proper barcodes that survived trimming, rRNA filtering and mapping for each method using HEK2993T cells (VASA-plate, VASA-drop, 10x Chromium, Smart-seq3 and Smart-seq-total). g, Percentage of unspliced reads for each method for HEK293T cells. VASA-seq detected more unspliced reads (44.1–56.5%) than the alternative technologies (12.8–38.1%). Data in boxplot represent the 25%, median (centre) and 75% percentiles with minimum and maximum values. The number of cells sampled were n = 976 (10x Chromium), 117 (Smart-seq3), 260 (Smart-seq-total), 726 (VASA-drop) and 192 (VASA-plate).

**Extended Data Fig. 3. Mouse gastrulation and organogenesis atlas.**
a, Brightfield microscope images of the embryos collected before dissociation. Two collections were performed for E6.5 (39 embryos total), whereas single collections were performed for E7.5 (8 embryos total), E8.5 (7 embryos total) and E9.5 (6 embryos total). Scale indicates a 1 mm scale for the background gridlines. b, Average gene expression correlation values (r²) per biotype across equivalent clusters between 10x Chromiumand VASA-seq at stage E8.5. n number of cells were 8,365 (VASA-seq) and 9,939 (10x). The equivalent clusters are annotated by using the percentage of cells assigned to a cell type in 10x Chromium. Only cell types present in more than 30% of the equivalent cluster are indicated. The points are the mean and standard error of the mean obtained by bootstrapping genes for each equivalent cluster and biotype 1000 times are represented. c, UMAP of E6.5 mouse embryo cells from 10x (n = 640) and VASA-seq (n = 298) that were part of equivalent clusters. Clusters that are detected in both technologies are marked with numbers 1–16 and each cluster is colored according to the cell type category: blue = ectoderm and grey = epiblast. Grey fill in cluster label indicates extra-embryonic contribution, black fill indicates embryonic contribution. d, UMAP of E7.5 mouse embryo cells from 10x (n = 3,319) and VASA-seq (n = 1,892) that were part of equivalent clusters. Clusters that are detected in both technologies are marked with numbers 1–38 and each cluster is colored according to the cell type category: green = blood, blue = ectoderm, purple = endoderm, orange = mesoderm and grey = epiblast. Grey fill in cluster label indicates extra-embryonic contribution, black fill indicates embryonic contribution. e, Scatter plot showing the number of differentially expressed genes per cluster at E6.5 in VASA-seq (x axis) vs. 10x Chromium (y axis) for spliced protein coding (left panel), unspliced protein coding (middle panel) and lncRNA (right panel) counts. Numbers indicate clusters where a higher number of marker genes were detected in 10x. Clusters are colored according to the cell type category: blue = ectoderm and grey = epiblast. f, Scatter plot showing the number of differentially expressed genes per cluster at E7.5 in VASA-seq (x axis) vs. 10x Chromium (y axis) for spliced protein coding (left panel), unspliced protein coding (middle panel) and lncRNA (right panel). Numbers indicate clusters where a higher number of marker genes were detected in 10x. Clusters are colored according to the cell type category: green = blood, blue = ectoderm, purple = endoderm, orange = mesoderm and grey = epiblast.

**Extended Data Fig. 4. Histone expression and regression of cell-cycle.**
a, UMAPs showing the log10 total counts for histone genes (left panel), S-phase genes (middle panel) and GM genes (right panel). Only cell cycle scoring using solely histone genes shows a clear cell cycle segregation in VASA-seq. b, Core expression of cell-type specification markers during gastrulation and early organogenesis projected on the 10x Chromium and regressed VASA-seq UMAP. c, Heatmap showing differentially expressed multi annotated histone genes. Rows display genes, and columns display cell types. Cell type categories/germ layers are colored above the heatmap.

**Extended Data Fig. 5. RNA velocity and global splicing marker analysis.**
a, Velocity of *Adgrf5* shown on diagrams of spliced vs. unspliced counts, along with UMAPs highlighting velocity and expression for the gene for VASA-seq (top) and 10x Chromium (bottom) showing both induction and repression in the endothelium for VASA-seq with high goodness of the fit. Goodness of the fit are values approximately one SD above average for each method to show genes that are good in both datasets. Black arrow indicates the endothelium in the VASA-seq dataset. b, Velocity of *Cacna2d2* shown on diagrams of spliced vs. unspliced counts, along with UMAPs highlighting velocity and expression for the gene for VASA-seq (top) and 10x Chromium (bottom) showing induction of the gene in the Primitive heart tube and first heart field for VASA-seq. Goodness of the fit are values approximately one SD above average for each method to show genes that are good in both datasets. c, Velocity of lncRNAs with unspliced molecules uniquely detected in the VASA-seq dataset for the endothelium. Phase diagrams of spliced vs. unspliced counts, along with UMAPs highlighting velocity and expression for VASA-seq show early induction of *Hoxa11os* in the the yolk sac, followed by induction of *Gm50321* across the endothelium (yolk sac and embryonic) and selective repression of *D030007L05Rik* at E9.5. Dots in the diagram are labelled according to developmental time points. d, Violin plot of the velocities across timepoints E6.5, E7.5 and E8.5 in the endothelium for the VASA-seq dataset showing differential induction and repression for lncRNAs. Dashed line indicates null velocity. e, Violin plot showing the distribution of coverage values obtained for splice nodes when computed at single-cell or pseudo-bulk level. f, Violin plot showing the number quantified spliced nodes (read coverage>5) obtained when quantified at the single-cell or pseudo-bulk level. g, Euler diagram showing the splicing node intersection between the DISN and SNM sets.

**Extended Data Fig. 6. Heart and blood development reveal tissue-specific AS patterns across developmental trajectories.**
a, Differential gene expression analysis using a two-sided Wilcoxon rank sum test between the FHF (negative average log2 fold-change values) and the PHT (positive average log2 fold-change values) with differentially expressed RBPs highlighted. Significance levels are indicated by color (grey non-significant and black significant), and determined by the following threshold: |average log2 fold-change | > 0.5 and Bonferroni adjusted p-value < 1E-05). b, *Rbfox2*_143 and 144 mutually exclusive exon usage in the FHF and PHT respectively. c, *Rbfox2* gene expression across the atlas, log2 normalized values. d, *Tpm1* sashimi plot between the ECE, FHF and PHT, dashed square highlights the region of interest plotted in Fig. 6c. e, *Tpm1_29* single-cell PSI UMAP plot across the atlas highlighting a PHT specific core exon usage at the C-terminus. f, *Tpm1_32* single-cell PSI UMAP plot across the atlas highlighting a PHT specific core exon usage at the C-terminus. g, UMAP plot across timepoints depicting erythropoietic cell types. h, Single-cell Ψ UMAPs of *Epb41_30*, *Add1_37*, *Ank1_43* and *Mbnl1_37* depicting alternative exonic usage across blood maturation trajectories. i, Single-cell gene expression UMAP plot depicting differences in gene expression for *Epb41*, *Add1*, *Ank1* and *Mbnl1* illustrating differences in gene expression that differ from the AS patterns observed across blood maturation.

See this image and copyright information in PMC

Comment in

Towards a full picture of the total transcriptome.
Tang L. Tang L. Nat Methods. 2022 Aug;19(8):922. doi: 10.1038/s41592-022-01581-5. Nat Methods. 2022. PMID: 35927479 No abstract available.

References

1. Hashimshony T, Wagner F, Sher N, Yanai I. CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Rep. 2012;2:666–673. doi: 10.1016/j.celrep.2012.08.003. - DOI - PubMed
1. Islam S, et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 2011;21:1160–1167. doi: 10.1101/gr.110882.110. - DOI - PMC - PubMed
1. Ramsköld D, et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 2012;30:777–782. doi: 10.1038/nbt.2282. - DOI - PMC - PubMed
1. Tang F, et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods. 2009;6:377–382. doi: 10.1038/nmeth.1315. - DOI - PubMed
1. Klein AM, et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015;161:1187–1201. doi: 10.1016/j.cell.2015.04.044. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High-throughput total RNA sequencing in single cells using VASA-seq

Affiliations

High-throughput total RNA sequencing in single cells using VASA-seq

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases