Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Dec 24:2024.12.24.630263.
doi: 10.1101/2024.12.24.630263.

sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing

Affiliations

sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing

Roozbeh Dehghannasiri et al. bioRxiv. .

Abstract

Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and k-mer counting in barcoded data, BKC, as a standalone tool. sc-SPLASH rediscovers known biology including V(D)J recombination and cell-type-specific alternative splicing in human and trans-splicing in tunicate (Ciona) and when applied to spatial datasets, detects sequence variation including tumor-specific somatic mutation. In sponge (Spongilla) and tunicate (Ciona), we uncover secreted repeat proteins expressed in immune-type cells and regulated during development; the sponge genes were absent from the reference assembly. sc-SPLASH provides a powerful alternative tool for exploring transcriptomes that is applicable to the breadth of life's diversity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. sc-SPLASH pipeline overview and analysis of human V(D)J rearrangement and spatial transcriptomics (Visium).
A. Overview of sc-SPLASH pipeline including preprocessing of 10x scRNA-seq data (cell barcode extraction and UMI deduplication) and performing anchor/target counting through the BKC module and then performing statistical analysis to identify anchors (constant sequences) followed by a diverse set of target sequences with single-cell dependent distribution. B. Comparison of sc-SPLASH with Cellranger and STARsolo, two state-of-the-art 10x processing tools indicates much higher efficiency for scSPLASH. Test samples are from Tabula Sapiens dataset and are ordered by size. C. Pfam analysis on unaligned extendors suggests that the immunoglobulin variable domain (V-set) has the highest number of unaligned extendors and also highest average entropy compared to other Pfam domains. D. Distribution of cell types and tissues containing in-frame V(D)J transcripts identified by sc-SPLASH+IgBLAST. E. Detection of a tumor-associated a double somatic mutation in gene MT-ND4 in squamous cell carcinoma Visium data by sc-SPLASH where Target 2 corresponding to the mutation has higher fraction in carcinoma cells (marked by red boundary). F. Spatially-regulated alternative splicing of RPS24 detected in electric eel Visium data where electrocytes (purple arrows) include exon 6 and stromal cells in the insulating septa (red arrows) exclude exon 6. We also show the correspondence between the RPS24 nucleotide sequence in human and electric eel.
Figure 2.
Figure 2.. Spongilla and Ciona repeat genes with target diversity are differentially expressed.
A. Multiple sequence alignment (MSA) of Spongilla granny anchor targets shows high target sequence diversity. The plot includes those with at least three reads across the sponge 10x dataset. B. Number and fraction of cells per celltype expressing the granny anchor (X/Y = expressing/total cells per celltype), suggesting predominant expression of the anchor in granulocytes and amoebocytes. Bars are colored by average normalized granny anchor count, calculated per cell as anchor count/UMI count×105. C. HCR RNA-FISH confirms granny anchor expression in granulocytes, where cells co-expressing ACP5 a granulocyte marker (red) and a probe set designed against different granny versions (yellow). D. Genomic structure of the GranRep gene family: GranRep1 and GranRep2, as well as GranRep4 and GranRep5, are on the same contig. All genes share the same 3-exon structure with granny repeats in exon 3, encoding a signal peptide, granny repeat region (30-bp repeats), lysine-rich region, and C-terminal repeats (18-bp repeats). Repeat numbers and region sizes vary by gene. E. Single-cell differential expression of GranRep genes suggesting granulocytes primarily express GranRep1/GranRep2, while amoebocytes primarily express GranRep3. Normalized expression per cell is calculated as aligned reads/UMI count×105. GranReps are ordered by abundance in each stack, with marker colors showing the most abundant gene. Cells with ≥10 GranRep reads and normalized expression ≥5 are shown. F. MSA of target sequences for Ciona YYD anchor suggesting substantial target diversity for this anchor. We show the targets most similar to the two found in the HT genomic reference with the highest counts in the dataset. G. Two genes in the HT genome are composed almost entirely of YYD repeats, except for a signal peptide. H. HCR RNA-FISH at the juvenile stage shows YYD anchor expression restricted to circulating hemocytes. Red channel (Cy5) in both images is FISH for YYD repeat; left is a merge with brightfield, right is a merge with DNA stain (blue channel, DAPI). I. YYD anchor expression across Ciona development peaks during metamorphosis (“early rotation”). Normalized count per sample is calculated as anchor count/total reads×105.

References

    1. Baharav Tavor Z., Tse David, and Salzman Julia. 2024. “OASIS: An Interpretable, Finite-Sample Valid Alternative to Pearson’s X2 for Scientific Discovery.” Proceedings of the National Academy of Sciences 121 (15): e2304671121. - PMC - PubMed
    1. Borcherding Nicholas, Bormann Nicholas L., and Kraus Gloria. 2020. “scRepertoire: An R-Based Toolkit for Single-Cell Immune Receptor Analysis.” F1000Research 9 (January):47. - PMC - PubMed
    1. Cao Chen, Lemaire Laurence A., Wang Wei, Yoon Peter H., Choi Yoolim A., Parsons Lance R., Matese John C., Wang Wei, Levine Michael, and Chen Kai. 2019. “Comprehensive Single-Cell Transcriptome Lineages of a Proto-Vertebrate.” Nature 571 (7765): 349–54. - PMC - PubMed
    1. Chaung Kaitlin, Baharav Tavor Z., Henderson George, Zheludev Ivan, Wang Peter L., and Salzman Julia. 2023. “SPLASH: A Statistical, Reference-Free Genomic Algorithm Unifies Biological Discovery.” bioRxiv : The Preprint Server for Biology, July. 10.1101/2022.06.24.497555. - DOI - PMC - PubMed
    1. Cuddleston Winston H., Li Junhao, Fan Xuanjia, Kozenkov Alexey, Lalli Matthew, Khalique Shahrukh, Dracheva Stella, Mukamel Eran A., and Breen Michael S.. 2022. “Cellular and Genetic Drivers of RNA Editing Variation in the Human Brain.” Nature Communications 13 (1): 2997. - PMC - PubMed

Publication types

LinkOut - more resources