Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 3;38(2):384-390.
doi: 10.1093/bioinformatics/btab648.

SplicingFactory-splicing diversity analysis for transcriptome data

Affiliations

SplicingFactory-splicing diversity analysis for transcriptome data

Benedek Dankó et al. Bioinformatics. .

Abstract

Motivation: Alternative splicing contributes to the diversity of RNA found in biological samples. Current tools investigating patterns of alternative splicing check for coordinated changes in the expression or relative ratio of RNA isoforms where specific isoforms are up- or down-regulated in a condition. However, the molecular process of splicing is stochastic and changes in RNA isoform diversity for a gene might arise between samples or conditions. A specific condition can be dominated by a single isoform, while multiple isoforms with similar expression levels can be present in a different condition. These changes might be the result of mutations, drug treatments or differences in the cellular or tissue environment. Here, we present a tool for the characterization and analysis of RNA isoform diversity using isoform level expression measurements.

Results: We developed an R package called SplicingFactory, to calculate various RNA isoform diversity metrics, and compare them across conditions. Using the package, we tested the effect of RNA-seq quantification tools, quantification uncertainty, gene expression levels and isoform numbers on the isoform diversity calculation. We analyzed a set of CD34+ hematopoietic stem cells and myelodysplastic syndrome samples and found a set of genes whose isoform diversity change is associated with SF3B1 mutations.

Availability and implementation: The SplicingFactory package is freely available under the GPL-3.0 license from Bioconductor for the Windows, MacOS and Linux operating systems (https://www.bioconductor.org/packages/release/bioc/html/SplicingFactory.html).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Correlation of various diversity metrics with gene expression and isoform number. Each dot represents a gene with at least two isoforms and at least one protein coding isoform. The x axis shows the log10(mean TPM) expression across the 17 control samples from the SRP133442 dataset, while the y axis shows a given mean diversity metric across the same samples. The color of the dots shows the log2(isoform number) for the gene. Gene with >20 isoforms were assigned the value 20. The purple line shows the smoothed conditional mean of the data using a generalized additive model. Naive—nonnormalized naive-entropy, Laplace—nonnormalized Laplace-entropy, Gini—Gini-index, Simpson—Simpson-index, Inv. Simpson—inverse Simpson-index, Naive norm—normalized naive-entropy, Laplace norm: normalized Laplace-entropy
Fig. 2.
Fig. 2.
Various factors influencing diversity metrics. (A) Consistency of diversity metrics using the Kallisto, Salmon and Salmon-SAF transcript quantification methods. Each boxplot shows the Spearman correlation values for a specific pairwise comparison in the 17 control samples from the SRP133442 dataset, showing two different diversity metrics. K—Kallisto, S—Salmon, SS—Salmon-SAF. (B) Influence of expression estimation uncertainty on diversity calculation. The x axis shows the expression MAD, while the y axis shows the naive normalized entropy MAD of selected marker genes in a single sample. Dot color shows the original nonbootstrap expression estimates for the gene. The corr. value on the panels is the Spearman-correlation of the expression MAD and the diversity MAD. (C) Influence of filtering low-expression transcripts before diversity calculation. The x axis shows the % criteria used for filtering transcripts (see Section 2), while the y axis shows the Spearman correlation of the diversity metrics using filtered results and the original unfiltered results. Vertical lines at the dots show bootstrap replicate confidence interval
Fig. 3.
Fig. 3.
Performance benchmarks and comparison of SplicingFactory to other tools. (A) Memory usage of SplicingFactory with increasing sample number for all diversity metrics while calculating the diversity values or also calculating differential diversity between sample groups. The x axis shows the increasing number of samples used, while the y axis shows the maximum amount of memory used. (B) Total elapsed time for calculating the diversity values or also calculating differential diversity between sample groups for all diversity metrics. The x axis shows the increasing number of samples used, while the y axis shows the total elapsed time for the calculation. (C) Spearman correlation of SpliceHetero entropy values with three different diversity metrics and Salmon expression estimates for the 17 MDS samples from the SRP133442 dataset. (D) Spearman-correlation of average modified variance values (E1 and E2) as calculated by the GSReg.SEVA function and average diversity values for three different diversity metrics for the 17 control and MDS samples from the SRP133442 dataset. (E) Spearman correlation of Whippet entropy and average diversity values for three different diversity metrics for the 17 control and MDS samples from the SRP133442 dataset
Fig. 4.
Fig. 4.
Differential diversity analysis results using MDS data. (A) Differential diversity results comparing SF3B1 mutated and nonmutated MDS samples for three different diversity metrics and Salmon expression estimates. The x axis shows the mean diversity across all samples, while the y axis shows the difference of means between the SF3B1 mutated and nonmutated MDS sample groups. Each gene is a single gray dot, and significant changes are highlighted in red. Significant changes are defined as |mean difference| > 0.1 and Wilcoxon-test adjusted P-value <0.05. (B) Enrichment of the Hay et al. (2018) marker gene sets in the differential diversity results separated by increasing or decreasing diversity between the SF3B1 mutated and nonmutated sample groups. The x axis shows the different marker gene sets, while the y axis shows the % of significantly changing genes that are falling into a specific marker gene set. Significant enrichment of a set is marked with an asterisk (Benjamini-Hochberg adjusted Fisher-test P-value <0.05)

References

    1. Afsari B. et al. (2018) Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer. Bioinformatics, 34, 1859–1867. - PMC - PubMed
    1. Baralle F.E., Giudice J. (2017) Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol., 18, 437–451. - PMC - PubMed
    1. Belickova M.M. et al. (2016) Up-regulation of ribosomal genes is associated with a poor response to azacitidine in myelodysplasia and related neoplasms. Int. J. Hematol., 104, 566–573. - PubMed
    1. Carrocci T.J. et al. (2017) SF3b1 mutations associated with myelodysplastic syndromes alter the fidelity of branchsite selection in yeast. Nucleic Acids Res., 45, 4837–4852. - PMC - PubMed
    1. Chalancon G. et al. (2012) Interplay between gene expression noise and regulatory network architecture. Trends Genet., 28, 221–232. - PMC - PubMed

Publication types