Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Apr;40(8):e61.
doi: 10.1093/nar/gkr1291. Epub 2012 Jan 20.

MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data

Affiliations

MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data

Shihao Shen et al. Nucleic Acids Res. 2012 Apr.

Abstract

Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (multivariate analysis of transcript splicing), a bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon splicing patterns, and a Markov chain Monte Carlo (MCMC) method coupled with a simulation-based adaptive sampling procedure to calculate the P-value and false discovery rate (FDR) of differential alternative splicing. Importantly, the MATS approach is applicable to almost any type of null hypotheses of interest, providing the flexibility to identify differential alternative splicing events that match a given user-defined pattern. We evaluated the performance of MATS using simulated and real RNA-Seq data sets. In the RNA-Seq analysis of alternative splicing events regulated by the epithelial-specific splicing factor ESRP1, we obtained a high RT-PCR validation rate of 86% for differential exon skipping events with a MATS FDR of <10%. Additionally, over the full list of RT-PCR tested exons, the MATS FDR estimates matched well with the experimental validation rate. Our results demonstrate that MATS is an effective and flexible approach for detecting differential alternative splicing from RNA-Seq data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Basic steps of MATS. (A) For each exon MATS uses the counts of RNA-Seq reads mapped to the exon–exon junctions of its inclusion or skipping isoform to estimate the exon inclusion levels in two samples. (B) The exon inclusion levels of all alternatively spliced cassette exons are used to construct a multivariate uniform prior that models the overall similarity in alternative splicing profiles between the two samples. (C) Based on the multivariate uniform prior and a binomial likelihood model for the RNA-Seq read counts of the exon inclusion/skipping isoforms, MATS uses a Markov chain Monte Carlo (MCMC) method to calculate the Bayesian posterior probability for splicing difference. (D) MATS calculates a P-value for each exon by comparing the observed posterior probability with a set of simulated posterior probabilities from the null hypothesis, followed by adjustment for multiple testing to obtain the FDR value.
Figure 2.
Figure 2.
Null hypotheses in MATS. (A) Under the default setting of MATS, the formula image alternative hypothesis is that the difference in the exon inclusion levels between two samples is above the user-defined cutoff formula image (the white area). The formula image null hypothesis is that the difference is below the user-defined cutoff formula image (the gray area). (B) MATS can test the extreme ‘switch-like’ differential alternative splicing pattern with a different hypothesis. The formula image alternative hypothesis is that the exon inclusion level is below a user-defined threshold formula image in sample 1 and above 1-formula image in sample 2, or vice versa (the white area). The formula image null hypothesis is outside the alternative hypothesis region (the gray area).
Figure 3.
Figure 3.
The multivariate uniform prior can model the between-sample correlation pattern in the RNA-Seq data. (A) The scatter plot of the estimated exon inclusion levels of 12 890 alternatively spliced cassette exons in the ESRP1 and EV samples. Only exons with at least 20 reads mapped to one of the three exon–exon junctions in both samples are included in the plot. (B) The scatter plot of the exon inclusion levels in two samples simulated from two independent uniform priors. In (A and B), the two red lines define the area where formula image. (C) The MCMC estimate of the correlation parameter formula image can capture the correlation pattern in the data. For the ESRP data, formula image is 0.93 (the red vertical line). For the 10 000 simulated data sets from independent uniform priors, formula image is distributed close to zero.
Figure 4.
Figure 4.
Simulation study of MATS. (A) Simulated exon inclusion levels of 5000 exons in two samples. A total of 95% of the data points are simulated from the null hypothesis (formula image) and 5% are simulated from the alternative hypothesis (formula image). (B–F) MATS FDR estimates on simulated data with the exon inclusion levels from (A) and the total junction count per exon and sample as 100 (B), 200 (C), 500 (D), 1000 (E) and 2000 (F). In each panel, exons are rank sorted by MATS FDR estimates in ascending order. The zoomed-in figure shows the FDR estimates of the top 250 exons by MATS.
Figure 5.
Figure 5.
Simulation study to compare MATS, a simplified MATS Bayesian model in which formula image is fixed at 0 (i.e. independent prior), and the Fisher exact test. MATS significantly outperforms the other two methods based on the AUC of the ROC curve (i.e. the true positive rate versus false positive rate plot).
Figure 6.
Figure 6.
RNA-Seq and RT–PCR analysis of SPNS1 exon 7 splicing. (A) RNA-Seq junction counts and MATS result of SPNS1 exon 7 in the EV and ESRP1 samples. (B) RT–PCR result of SPNS1 exon 7 in the EV and ESRP1 samples.
Figure 7.
Figure 7.
RT–PCR validation of 164 exons covering a broad range of MATS FDR values. All exons analyzed by MATS are rank sorted by FDR estimates (y-axis) in ascending order. The 164 exons tested by RT–PCR are divided into four non-overlapping cohorts according to the FDR estimates. The validation rate for each cohort is shown.

Similar articles

Cited by

References

    1. Keren H, Lev-Maor G, Ast G. Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet. 2010;11:345–355. - PubMed
    1. Graveley BR. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 2001;17:100–107. - PubMed
    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–476. - PMC - PubMed
    1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 2008;40:1413–1415. - PubMed
    1. Cooper TA, Wan L, Dreyfuss G. RNA and disease. Cell. 2009;136:777–793. - PMC - PubMed

Publication types

Substances