Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 15;30(12):i121-9.
doi: 10.1093/bioinformatics/btu277.

Deep learning of the tissue-regulated splicing code

Affiliations

Deep learning of the tissue-regulated splicing code

Michael K K Leung et al. Bioinformatics. .

Abstract

Motivation: Alternative splicing (AS) is a regulated process that directs the generation of different transcripts from single genes. A computational model that can accurately predict splicing patterns based on genomic features and cellular context is highly desirable, both in understanding this widespread phenomenon, and in exploring the effects of genetic variations on AS.

Methods: Using a deep neural network, we developed a model inferred from mouse RNA-Seq data that can predict splicing patterns in individual tissues and differences in splicing patterns across tissues. Our architecture uses hidden variables that jointly represent features in genomic sequences and tissue types when making predictions. A graphics processing unit was used to greatly reduce the training time of our models with millions of parameters.

Results: We show that the deep architecture surpasses the performance of the previous Bayesian method for predicting AS patterns. With the proper optimization procedure and selection of hyperparameters, we demonstrate that deep architectures can be beneficial, even with a moderately sparse dataset. An analysis of what the model has learned in terms of the genomic features is presented.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Architecture of the DNN used to predict AS patterns. It contains three hidden layers, with hidden variables that jointly represent genomic features and cellular context (tissue types)
Fig. 2.
Fig. 2.
Plot of the change in AUCLMH_All by substituting the values in each feature groups by their median. Feature groups that are more important to the predictive performance of the model have lower values. The groups are sorted by the mean over multiple partitions and folds, with the standard deviations shown. The number of features for each feature group are indicated in brackets
Fig. 3.
Fig. 3.
Magnitude of the backpropagated signal to the input of the top 50 features computed when the targets are changed from low to high, and high to low. White indicates that the magnitude of the signal is large, meaning that small perturbations to this input can cause large changes to the model's predictions. The features are approximately sorted left to right by the magnitude

References

    1. Ahn S, et al. Proceedings of the 29th International Conference on Machine Learning. 2012. Bayesian posterior sampling via stochastic gradient fisher scoring; pp. 1591–1598.
    1. Barash Y, et al. Deciphering the splicing code. Nature. 2010;465:53–59. - PubMed
    1. Barash Y, et al. AVISPA: a web tool for the prediction and analysis of alternative splicing. Genome Biol. 2013;14:R114. - PMC - PubMed
    1. Barbosa-Morais NL, et al. The evolutionary landscape of alternative splicing in vertebrate species. Science. 2012;338:1587–1593. - PubMed
    1. Bengio Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning. 2009;2:1–127.

Publication types