Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun;49(6):848-855.
doi: 10.1038/ng.3837. Epub 2017 Apr 17.

Pathogenic variants that alter protein code often disrupt splicing

Affiliations

Pathogenic variants that alter protein code often disrupt splicing

Rachel Soemedi et al. Nat Genet. 2017 Jun.

Abstract

The lack of tools to identify causative variants from sequencing data greatly limits the promise of precision medicine. Previous studies suggest that one-third of disease-associated alleles alter splicing. We discovered that the alleles causing splicing defects cluster in disease-associated genes (for example, haploinsufficient genes). We analyzed 4,964 published disease-causing exonic mutations using a massively parallel splicing assay (MaPSy), which showed an 81% concordance rate with splicing in patient tissue. Approximately 10% of exonic mutations altered splicing, mostly by disrupting multiple stages of spliceosome assembly. We present a large-scale characterization of exonic splicing mutations using a new technology that facilitates variant classification and keeps pace with variant discovery.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare no competing financial interests.

Figures

Figure 1.
Figure 1.. Massively Parallel Splicing Assay (MaPSy) on the 5K panel.
a, The panel consists of 4,964 mutant and wildtype pairs. b, The panel is incorporated into three exons in vivo library. Allelic ratios of both input and output were determined by deep sequencing. The result of RT-PCR from output RNA (spliced species) is shown (Supplementary Figure 2f). Splicing aberrations were found in 18% of mutants. c, Allelic ratios were determined in spliceosomal intermediates, ~24% species disrupt splicing in vitro. N.E.: nuclear extract d, Allelic splicing ratios in vivo versus in vitro. e, Cryptic splice-site usage in vivo versus in vitro. f, Exonic splicing mutations identified in ~10% of the 5K panel. g, Summary of MaPSy validations in patient samples.
Figure 2.
Figure 2.. Prevalence of splicing mutations in disease genes.
a, Left: Splice-site mutations (SSM) versus all exonic mutations in the Human Gene Mutation Database (HGMD8) with region of 99.9% confidence interval shown in gray. Middle, right: Number of SSM versus nonsense (middle), and SSM versus missense (right) in all disease genes. b, Mean of exonic splicing mutation (ESM) percentage in each gene is plotted against roughly equal bins of percent SSM in HGMD genes (n = 708). c, Mean of ESM percentage in each exon versus number of SSM per exon (n = 2,048). d, Percent ESM in haploinsufficient (HI, n = 174), moderate HI (n = 567) and haplosufficient (HS, n =874) genes in autosomal dominant diseases in the 5K panel. e, Percent SSM in HGMD with autosomal dominant inheritance in HI (n = 1,383), moderate HI (n = 14,059) and HS (n = 59,901) genes. Error bars in b,c represent standard error of the mean. Error bars in d,e represent 95% confidence intervals.
Figure 3.
Figure 3.. Random forest classification of exonic mutations that disrupt splicing.
a, Classification performance of the random forest model was calculated as the area under the curve (AUC) in receiver operating characteristic (ROC) analysis. b, The order of variable importance by mean decrease in accuracy. Error bars indicate standard deviations. The directions (DIR) of change that promote exonic splicing mutations (ESM) are indicated, positive directions are colored blue, and negative directions are colored red. Variables include differences in splice-site strength and hexamer splicing scores (SS STRENGTH DIFF, ESRseq DIFF), sum of the effects of splice-site variants at Human Gene Mutation Database (HGMD) and Exome Aggregation Consortium (ExAC) datasets (HGMD SS VARS, ExAC SS VARS),, numbers of exon splicing enhancers (ESE) and exon splicing silencers (ESS) in the exon (N ESE, N ESS), free-energy estimate (dG (kcal/mol) WT EXON), exon conservation (EXON PHASTCON), number of introns (N INTRONS) and relative exon position in the gene (EXON POS IN GENE). PPT: Polypyrimidine track.
Figure 4.
Figure 4.. Detection of RNA binding protein (RBP) motifs that affect splicing.
a, All mutant/wildtype (M/W) pairs were examined for difference in position-weight-matrices agreement with 155 RBP motifs and known exonic cis-elements. b, Motif profiles show clear trends in agreement with previously defined functions. Shaded blue regions indicate 95% confidence intervals. c, Clustering of data shows similar function of RBP motifs in vivo and in vitro. The mean values from each bin are colored black. d, Left: In the absence of SRSF1, the mutant (MT) that disrupts the SRSF1 binding motif had a modest but not a significant increase in exon skipping, while the wildtype (WT) exon with the SRSF1 motif had a two-fold increase in exon skipping. Right: the splicing phenotype of a mutation that creates a PTBP1 binding motif were rescued (~0.5 fold less of skipping event) when PTBP1 was knocked down, but not the wildtype exon. Three stars on top of the bar indicate statistical significance (P < 0.001, two-sided Cochran-Mantel-Haenszel test). Error bars indicate standard deviation. kd: knockdown; ctrl: control.
Figure 5.
Figure 5.. Isolation of spliceosomal intermediates.
a, After MaPSy in vitro, splicing reaction was loaded to 10–30% glycerol gradient, followed by fractionation. Different spliceosome stages were retrieved in different fractions. b, Spliceosomal complexes (B/C, A, E, H) visualized in native gels for control (top) and heterogeneous library substrates (bottom). c, RNA splicing intermediates migrate to the same fractions in control and library substrates (orange underlines). Total RNA pre (T) and post (T’) splicing are indicated. d, Reassembly of purified B/C and A fractions (middle and bottom), compared to the assembly of original input (top). Fractions used for SELEX are underlined (cyan).
Figure 6.
Figure 6.. Clustering of allelic ratios provides exonic splicing mutation (ESM) mechanistic insights.
The result of the hierarchical clustering of allelic ratios in spliceosomal fractions is shown (center plot) with representative clusters shown in different colors. The individual panels surrounding the center plot show allelic ratios of each mutant/wildtype (m/w) pairs in the different fractions (t0, A, BC and spliced (spl,sp)) for the corresponding clusters. Each pair is colored according to its ESM classification (dark red for significance in both assays, orange for significance in vitro, and gray for negative pairs). The complete profile of all clusters can be found in Supplementary Fig. 9c. Pie charts in individual panels indicate the proportion of ESM classifications. Spliceosome stages are depicted at the right of the individual panels. Major disruptions in assembly transitions are indicated with red arrows and minor disruptions are indicated with purple arrows.

References

    1. Baird PA, Anderson TW, Newcombe HB & Lowry RB Genetic disorders in children and young adults: a population study. Am JHum Genet 42, 677–93 (1988). - PMC - PubMed
    1. Yang Y et al. Molecular findings among patients referred for clinical whole-exome sequencing. JAMA 312, 1870–9 (2014). - PMC - PubMed
    1. Bamshad MJ et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12, 745–55 (2011). - PubMed
    1. Tennessen JA et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–9 (2012). - PMC - PubMed
    1. Xue Y et al. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet 91, 1022–32 (2012). - PMC - PubMed

Online Method References

    1. Yeo G & Burge CB Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal ofComputationaI Biology 11, 377–94 (2004). - PubMed
    1. Gozani O, Patton JG & Reed R A novel set of spliceosome-associated proteins and the essential splicing factor PSF bind stably to pre-mRNA prior to catalytic step II of the splicing reaction. EMBO Journal 13, 3356–67 (1994). - PMC - PubMed
    1. Reichert V & Moore MJ Better conditions for mammalian in vitro splicing provided by acetate and glutamate as potassium counterions. Nucleic Acids Res 28, 416–23 (2000). - PMC - PubMed
    1. Dobin A et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). - PMC - PubMed
    1. Kursa MB, Jankowski A & Rudnicki WR Boruta - A System for Feature Selection. Fundamenta Informaticae 101, 271–286 (2010).