Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15;26(12):i325-33.
doi: 10.1093/bioinformatics/btq200.

Model-based detection of alternative splicing signals

Affiliations

Model-based detection of alternative splicing signals

Yoseph Barash et al. Bioinformatics. .

Abstract

Motivation: Transcripts from approximately 95% of human multi-exon genes are subject to alternative splicing (AS). The growing interest in AS is propelled by its prominent contribution to transcriptome and proteome complexity and the role of aberrant AS in numerous diseases. Recent technological advances enable thousands of exons to be simultaneously profiled across diverse cell types and cellular conditions, but require accurate identification of condition-specific splicing changes. It is necessary to accurately identify such splicing changes to elucidate the underlying regulatory programs or link the splicing changes to specific diseases.

Results: We present a probabilistic model tailored for high-throughput AS data, where observed isoform levels are explained as combinations of condition-specific AS signals. According to our formulation, given an AS dataset our tasks are to detect common signals in the data and identify the exons relevant to each signal. Our model can incorporate prior knowledge about underlying AS signals, measurement quality and gene expression level effects. Using a large-scale multi-tissue AS dataset, we demonstrate the advantage of our method over standard alternative approaches. In addition, we describe newly found tissue-specific AS signals which were verified experimentally, and discuss associated regulatory features.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
High-throughput AS data representation and analysis: (A) isoforms including and excluding a cassette exon can be quantified using a single number, representing the percent of isoforms that include the exon. A separate number gives the overall gene expression level (not shown here). (B) High-throughput AS data as a matrix of percent inclusion values for different exons (rows) under different conditions (columns). (C) The same matrix for a real dataset (Fagnani et al., 2007), after agglomerative clustering. Inclusion levels are displayed as a heat-map, with the subset of CNS tissues visible on the left. (D) Four examples of exons exhibiting condition-specific splicing changes in CNS, muscle and embryo tissues. Arrows show the position of each exon in the clustergram. These relative positions do not convey well the tissue groups in which each of these exons exhibit splicing changes. (E) The five underlying AS signals identified by our model and (F) how each of these signals is associated with the four exon examples.
Fig. 2.
Fig. 2.
A Bayesian network representation of the model. Observed variables are colored and dependencies are denoted with directed edges. The dashed frame denotes elements shared with standard FA.
Fig. 3.
Fig. 3.
Comparison to alternative approaches. (AC) SVD analysis, including the singular values (A), examples of the first five condition specific eigen-exons (B), and a heat map (C) of the pair-wise correlation between the first five eigen-exons identified from ten random subsets of the data. (D) Comparison of the FII, which measure enrichment of previously reported regulatory features in groups of exons assigned the CNS (left) and muscle (right) AS signals. Signal assignment was performed using our model (denoted ASFA), SVD analysis, and by computing for each exon the difference between the mean inclusion level in the pre-defined tissue group and the other tissues (denoted Manual).
Fig. 4.
Fig. 4.
The effect of varying the number of condition-specific AS signals between two and six: (A) the number of iterations until convergence. (B) Free energy (average bits per instance) for the train set. (C) Free energy for the test set. In all plots the baseline is a model with only two signals, given on the far left, and therefore all values for it are by definition zero.
Fig. 5.
Fig. 5.
The effect of different model settings on the identified AS signals. (AC) Heat maps of the pairwise correlation between all AS signals identified in 10 random subsets of the data when (A) learning five AS signals (B) learning four signals, and (C) learning four signals with three signal initialized to CNS, muscle and embryo tissues. (D) Examples of the AS signals identified. The four on the right match the ones shown in Figure 1. The leftmost signal corresponds to a possible split of CNS tissues into two subgroups.

References

    1. Ashiya M, Grabowski PJ. A neuron-specific splicing switch mediated by an array of pre-mRNA repressor sites: evidence of a regulatory role for the polypyrimidine tract binding protein and a brain-specific PTB counterpart. RNA. 1997;3:996–1015. - PMC - PubMed
    1. Attias H. Independent factor analysis. Neural Comput. 1999;11:803–851. - PubMed
    1. Barash Y, et al. Deciphering the splicing code. Nature. 2010;464:7294. - PubMed
    1. Bar-Joeph Z, et al. Computational discovery of gene modules and regulatory networks. Nat. Biotechnol. 2003;21:1337–1342. - PubMed
    1. Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell. 2004;117:185–198. - PubMed

Publication types