Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 5;22(1):219.
doi: 10.1186/s13059-021-02434-8.

Specific splice junction detection in single cells with SICILIAN

Affiliations

Specific splice junction detection in single cells with SICILIAN

Roozbeh Dehghannasiri et al. Genome Biol. .

Abstract

Precise splice junction calls are currently unavailable in scRNA-seq pipelines such as the 10x Chromium platform but are critical for understanding single-cell biology. Here, we introduce SICILIAN, a new method that assigns statistical confidence to splice junctions from a spliced aligner to improve precision. SICILIAN is a general method that can be applied to bulk or single-cell data, but has particular utility for single-cell analysis due to that data's unique challenges and opportunities for discovery. SICILIAN's precise splice detection achieves high accuracy on simulated data, improves concordance between matched single-cell and bulk datasets, and increases agreement between biological replicates. SICILIAN detects unannotated splicing in single cells, enabling the discovery of novel splicing regulation through single-cell analysis workflows.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests. Ana Damljanovic is an employee of Seven Bridges, Inc.

Figures

Fig. 1
Fig. 1
Overview of the SICILIAN statistical framework and its performance evaluation based on benchmarking datasets. A High and variable fraction of junctional reads across diverse cell types in the HLCA dataset [11]. Each violin plot shows the fraction of mapped reads in each cell (within a cell type) that are junctional. B SICILIAN takes the alignment information file (usually in the form of a BAM file) from a spliced aligner such as STAR and then deploys its statistical modeling to assign a statistical score to each junction. C SICILIAN utilizes the cell-level statistical scores (empirical p values) for each junction across 10x samples to correct for increased false discovery rates due to multiple hypothesis testing. The corrected score is called the “SICILIAN score” and can be used to consistently call junctions across cells. (D) SICILIAN improves the concordance between detected splicing junctions in single cells and bulk cell lines. (E) ROC curves by SICILIAN and read count criteria for four simulated datasets [8, 12] (the top two based on data from [12] and the bottom two based on data from [8])
Fig. 2
Fig. 2
Splice junction discovery in human lung (HLCA) and mouse lemur lung (MLCA) cells. A SICILIAN filters out a higher proportion of unannotated junctions than annotated in all individuals from both human and mouse Lemur datasets [only junctions with at least two reads in the given dataset are plotted]. B Splice junctions identified by SICILIAN in gene GDI1 in human; annotated splicing was maintained and unannotated noisy junctions were all removed. C Better discrimination between annotated and unannotated junctions in the HLCA dataset achieved by the SICILIAN statistical criterion. D The number of junctions found in both human individuals that are called consistently by SICILIAN is larger than the number that are called differently in almost every case, regardless of the number of junctional reads. E The number of junctions called consistently between the two mouse lemur individuals is also larger than the number called inconsistently at almost every read depth after SICILIAN filtering. F The fraction of HLCA junctions that are found in CHESS and GTEx databases before and after applying SICILIAN to STAR raw calls. G The number of mouse lemur junctions orthologous junctions (found by LiftOver from Mmur3 to hg38) that have been also detected in the HLCA dataset. Junctions have been further classified based on their annotation status in the mouse lemur and human transcriptomes

References

    1. Baralle FE, Giudice J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 2017;18(7):437–51. 10.1038/nrm.2017.27. - PMC - PubMed
    1. Scotti MM, Swanson MS. RNA mis-splicing in disease. Nat. Rev. Genet. 2016;17(1):19–32. doi: 10.1038/nrg.2015.3. - DOI - PMC - PubMed
    1. Westoby J, Artemov P, Hemberg M, Ferguson-Smith A. Obstacles to detecting isoforms using full-length scRNA-seq data. Genome Biol. 2020;21(1):74. doi: 10.1186/s13059-020-01981-w. - DOI - PMC - PubMed
    1. Szabo L, Salzman J. Detecting circular RNAs: bioinformatic and experimental challenges. Nat. Rev. Genet. 2016;17(11):679–692. doi: 10.1038/nrg.2016.114. - DOI - PMC - PubMed
    1. Szabo L, Morey R. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. Genome Biol. 2015;16(1):126. 10.1186/s13059-015-0690-5. - PMC - PubMed

Publication types

LinkOut - more resources