Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan;38(1):16-24.
doi: 10.1002/humu.23111. Epub 2016 Oct 3.

ExonImpact: Prioritizing Pathogenic Alternative Splicing Events

Affiliations

ExonImpact: Prioritizing Pathogenic Alternative Splicing Events

Meng Li et al. Hum Mutat. 2017 Jan.

Abstract

Alternative splicing (AS) is a closely regulated process that allows a single gene to encode multiple protein isoforms, thereby contributing to the diversity of the proteome. Dysregulation of the splicing process has been found to be associated with many inherited diseases. However, among the pathogenic AS events, there are numerous "passenger" events whose inclusion or exclusion does not lead to significant changes with respect to protein function. In this study, we evaluate the secondary and tertiary structural features of proteins associated with disease-causing and neutral AS events, and show that several structural features are strongly associated with the pathological impact of exon inclusion. We further develop a machine-learning-based computational model, ExonImpact, for prioritizing and evaluating the functional consequences of hitherto uncharacterized AS events. We evaluated our model using several strategies including cross-validation, and data from the Gene-Tissue Expression (GTEx) and ClinVar databases. ExonImpact is freely available at http://watson.compbio.iupui.edu/ExonImpact.

Keywords: alternative splicing; disease; exon impaction; machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of the study, which contains four major components: data collection, feature extraction, model training and model evaluation.
Figure 2
Figure 2
Feature evaluation. (A) Probability density of each feature in pathogenic and neutral groups, respectively. (B) Probability cumulative density of each feature in HGMD and neutral groups, respectively. (C) Scatter plot of Wilcoxon rank-sum test’s p-values for each category of features. X-axis shows the −log10 (p-value).
Figure 3
Figure 3
(A) Scatter plot between the average ASA score and the minimum probability of random coil. (B) An example (NM_014946) demonstrating the relationship between protein secondary and tertiary structures.
Figure 4
Figure 4
PCA biplot of all features. Red and green dots represent pathogenic and neutral events, respectively. Each arrow line demonstrates one feature, and the color of the line indicates its category.
Figure 5
Figure 5
(A) ROC curve on an independent test data set. (B) Employing the 1,000 Genomes Project data set, the percentage of predicted high impact events (FIS ⩾ 0.91) among the events with weak, intermediate and strong variants that disrupt splicing are shown. (C) Relationship between True Positive Rate, False Positive Rate, F1 Score, MCC with cutoff, y=0.1 and x=0.82 is plotted to show the corresponding cutoff for the False Positive Rate = 0.1.
Figure 6
Figure 6
Proportion (%) of events with different levels of inclusion ratio in human brains that have low, intermediate and high FIS scores.

Similar articles

Cited by

References

    1. Chen L, Bush SJ, Tovar-Corona JM, Castillo-Morales A, Urrutia AO. Correcting for differential transcript coverage reveals a strong relationship between alternative splicing and organism complexity. Mol. Biol. Evol. 2014;31(6):1402–1413. - PMC - PubMed
    1. Consortium TGP A global reference for human genetic variation. Nature. 2012;526:68–74. - PMC - PubMed
    1. Ellis JD, Barrios-Rodiles M, Colak R, Irimia M, Kim T, Calarco JA, Wang X, Pan Q, O'Hanlon D, Kim PM, et al. Tissue-specific alternative splicing remodels protein-protein interaction networks. Mol Cell. 2012;46:884–892. - PubMed
    1. Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y. SPINE X: Improving protein secondary structure prediction by multi-step learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem. 2012;33(3):259–267. - PMC - PubMed
    1. Faustino NA, Cooper TA. Pre-mRNA splicing and human disease. Gene Dev. 2003;17:419–437. - PubMed

Publication types

Substances