Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 1;34(11):1859-1867.
doi: 10.1093/bioinformatics/bty004.

Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer

Affiliations

Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer

Bahman Afsari et al. Bioinformatics. .

Abstract

Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches.

Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data.

Availability and implementation: SEVA is implemented in the R/Bioconductor package GSReg.

Contact: bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of SEVA. (a) Relative junction expression quantifies the distribution of isoform usage of a gene. For simplicity of this example, we show a gene with three exons. The model is shown for two samples of gene isoform usage: one with higher relative expression of an isoform with all three exons (left) and another with higher relative expression of an isoform that skips the middle exon (right). The relative strength of junction expression in overlapping pairs (e.g. J1, 2 with J1, 3 or J2, 3 with J1, 3) corresponds to the relative proportion of isoform usage. (b) Example of gene model from (a) in multiple normal (N, left) and tumor (T, right) samples. Note that the normal samples have lower heterogeneity of gene isoform usage than the tumor samples. (c) To quantify isoform expression, SEVA compares the expression of all pairs of overlapping junctions (see a and d). A dissimilarity measure is obtained from the concordance of the comparisons of pairs of overlapping junctions in each pair of samples. This measure is applied to all pairs of samples from the same phenotype (see b) and then U-statistics theory is applied to these measures to compare the variation of gene isoform usage between the phenotypes. (d) Extension of (a) for a more complex gene splicing model
Fig. 2.
Fig. 2.
Performance in simulated RNA-Seq data. (a) Precision of different algorithms (in legend) on the simulated dataset. Varying numbers of the total tumor samples with alteration events (x-axis), with tumor heterogeneity decreasing along the x-axis. (b) Recall of simulated data, as in (a). Precision and recall computed separately for all genes (solid) and for the subset of 300 genes that are differentially expressed (dashed)
Fig. 3.
Fig. 3.
Comparison of splice variant events identified in different algorithms in real HPV+ HNSCC RNA-seq data. Variability of junction expression profiles corresponding to gene isoforms. Each point represents a gene, x-axis and y-axis its variability computed for SEVA in normal versus cancer, respectively. The points color distinguishes differentially spliced (DS) genes identified with SEVA and genes that were not significantly spliced (non-DS). (b) Venn diagram comparing differentially spliced genes identified by SEVA and EBSeq, as well as differential expression status of each gene. (c) Comparison of SEVA and DiffSplice as described in (b)
Fig. 4.
Fig. 4.
Multidimensional scaling (MDS) plot of splice dissimilarity measures in real HPV+ HNSCC junction expression from RNA-seq for (a) DST, (b) LAMA3, (c) RASIP1 and (d) TP63. Relative spread of samples in the MDS plots indicates their relative variability in normal samples (circles) and tumor samples (triangles)
Fig. 5.
Fig. 5.
Comparison of differential gene isoform in TCGA HNSCC RNA-seq data. (a) Variability of junction expression profiles for genes significantly DS from SEVA in HPV+ versus HPV- HNSCC, respectively and not significantly DS. (b) As for (a) comparing HPV- samples with and without alterations in RNA splice machinery genes

References

    1. Afsari B. et al. (2014a) Learning dysregulated pathways in cancers from differential variability analysis. Cancer Inform., 13, 61–67. - PMC - PubMed
    1. Afsari B. et al. (2014b) Rank discriminants for predicting phenotypes from RNA expression. Ann. Appl. Stat., 8, 1469–1491.
    1. Alamancos G.P. et al. (2015) Leveraging transcript quantification for fast computation of alternative splicing profiles. RNA, 21, 1521–1531. - PMC - PubMed
    1. Anders S. et al. (2012) Detecting differential usage of exons from RNA-seq data. Genome Res., 22, 2008–2017. - PMC - PubMed
    1. Bolstad B.M. et al. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193. - PubMed

Publication types

Substances