Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 7;110(12):2056-2067.
doi: 10.1016/j.ajhg.2023.10.014. Epub 2023 Nov 24.

Improved detection of aberrant splicing with FRASER 2.0 and the intron Jaccard index

Affiliations

Improved detection of aberrant splicing with FRASER 2.0 and the intron Jaccard index

Ines F Scheller et al. Am J Hum Genet. .

Abstract

Detection of aberrantly spliced genes is an important step in RNA-seq-based rare-disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method that outperformed alternative methods of detecting aberrant splicing. However, because FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron-excision metric, the intron Jaccard index, that combines the alternative donor, alternative acceptor, and intron-retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs by using candidate rare-splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm, FRASER 2.0, called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare-splice-disrupting variants by 10-fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. To lower the multiple-testing correction burden, we introduce an option to select the genes to be tested for each sample instead of a transcriptome-wide approach. This option can be particularly useful when prior information, such as candidate variants or genes, is available. Application on 303 rare-disease samples confirmed the relative reduction in the number of outlier calls for a slight loss of sensitivity; FRASER 2.0 recovered 22 out of 26 previously identified pathogenic splicing cases with default cutoffs and 24 when multiple-testing correction was limited to OMIM genes containing rare variants. Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by drastically reducing the amount of splicing outlier calls per sample at minimal loss of sensitivity.

Keywords: Aberrant splicing; RNA-seq; outlier detection; rare disease; rare disease diagnostics; rare variant.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Intron Jaccard index improves splicing outlier calling (A) Sashimi plot of three GTEx skin not-sun-exposed RNA-seq samples showing exons 4 and 5 of KRT1. A splicing outlier was detected in the top sample with the ψ5 metric of FRASER (red). The position of the donor site of the outlier intron is indicated with a blue dashed line. Although this intron is not expressed in most other samples (dark blue) and is therefore detected with a high Δψ5 value as shown in the table on the right, its functional impact is probably minor because the canonical intron remains largely dominant. (B) Schematic definition of the intron Jaccard index for an intron of interest (purple) defined by a donor site d and acceptor site a. The set of donor-associated reads D (red) and acceptor-associated reads A (blue) are highlighted. (C) Representation of different types of aberrant splicing events that can be captured with the intron Jaccard index. The right column contains the formulae to compute the intron Jaccard index of the canonical intron (black dotted line) from the split (s) and non-split (u) reads of the involved introns in each scenario. (D) Recall of rare splice-disrupting candidate variants as defined by VEP (canonical splice-site variants, n = 1,544), MMSplice (n = 3,395), SpliceAI (n = 2,971), and AbSplice (n = 2,265) versus the rank of nominal p values from FRASER (light blue) and from an adaptation of FRASER using the intron Jaccard index (dark blue) on the GTEx skin not-sun-exposed dataset (n = 582). Different nominal p value cutoffs are indicated with shapes.
Figure 2
Figure 2
FRASER 2.0 increases recall of rare splice-disrupting candidate variants on GTEx (A) Quantile-quantile plots of the p values for the different splice metrics from FRASER (ψ3, ψ5, θ, shown in shades of blue) and the intron Jaccard index from FRASER 2.0 (purple) on the GTEx skin not-sun-exposed dataset. The red line depicts the diagonal, and the gray ribbon around it depicts the 95% confidence interval. (B) Recall of rare splice-disrupting candidate variants as defined by the variant annotation tools VEP, MMSplice, SpliceAI, and AbSplice (facets) versus the rank of nominal p values combined across GTEx tissues for FRASER (blue), FRASER 2.0 (purple), LeafCutterMD (yellow), and SPOT (green). Nominal p value cutoffs are indicated with shapes. (C) Venn diagram of the overlap of gene-level splicing outliers found with FRASER (blue) and FRASER 2.0 (purple). (D) Box plots of the number of splicing outliers (gene level) per sample (y axis) called by FRASER (blue) and FRASER 2.0 (purple) for each GTEx tissue (x axis). All brain tissues have been combined for readability.
Figure 3
Figure 3
FRASER 2.0 is less sensitive to sequencing depth than previous methods (A) Scatterplot of the number of splicing outliers at the gene level against the total mapped reads per sample on the GTEx skin not-sun-exposed dataset for LeafCutterMD, SPOT, FRASER, and FRASER 2.0 (facets). Spearman correlation coefficients (rho) are shown. All are significant (Spearman test, p < 3 × 10−3). (B) Box plots of the Spearman correlation coefficients (y axis) between the mapped reads and the number of gene-level splicing outliers called by LeafCutterMD, SPOT, FRASER, and FRASER 2.0 (x axis) for each GTEx tissue (n = 48). p values of Wilcoxon tests are shown above brackets.
Figure 4
Figure 4
Application of FRASER 2.0 to rare-disease cohorts (A) Distribution of the splicing outliers per sample at the gene level on the UDN (n = 391, n = 252, n = 104 for fibroblasts, blood poly(A), and blood total RNA, respectively) and the Yépez et al. dataset (n = 303) for FRASER (blue) and FRASER 2.0 applied to three gene sets considered for FDR correction: expressed genes (dark purple), expressed OMIM genes (light purple), and expressed OMIM genes with a rare variant (violet red; see material and methods). (B) Number of events (bars) in all non-empty intersections (linked dots) between four splicing outlier sets from the Yépez et al. dataset: (1) the 26 originally reported pathogenic events, (2) the transcriptome-wide significant FRASER 2.0 calls, (3) the significant FRASER 2.0 calls when only OMIM genes with a rare variant are considered, and (4) the transcriptome-wide significant FRASER calls. Intersections with the set of pathogenic events are highlighted in red. (C) Fraction of recovered pathogenic splicing outliers from the Yépez et al. dataset (y axis, total n = 26) when subsampling to different sample sizes (x axis) was performed. Each sample size was randomly sampled five times. RV: rare variant.

Update of

References

    1. Kelemen O., Convertini P., Zhang Z., Wen Y., Shen M., Falaleeva M., Stamm S. Function of alternative splicing. Gene. 2013;514:1–30. - PMC - PubMed
    1. Vuong C.K., Black D.L., Zheng S. The neurogenetics of alternative splicing. Nat. Rev. Neurosci. 2016;17:265–281. - PMC - PubMed
    1. Rogalska M.E., Vivori C., Valcárcel J. Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects. Nat. Rev. Genet. 2022:1–19. - PubMed
    1. López-Bigas N., Audit B., Ouzounis C., Parra G., Guigó R. Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett. 2005;579:1900–1903. - PubMed
    1. Baralle D., Buratti E. RNA splicing in human disease and in the clinic. Clin. Sci. 2017;131:355–368. - PubMed

Publication types

LinkOut - more resources