Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Jul;17(7):407-421.
doi: 10.1038/nrg.2016.46. Epub 2016 May 31.

Lessons from non-canonical splicing

Affiliations
Review

Lessons from non-canonical splicing

Christopher R Sibley et al. Nat Rev Genet. 2016 Jul.

Abstract

Recent improvements in experimental and computational techniques that are used to study the transcriptome have enabled an unprecedented view of RNA processing, revealing many previously unknown non-canonical splicing events. This includes cryptic events located far from the currently annotated exons and unconventional splicing mechanisms that have important roles in regulating gene expression. These non-canonical splicing events are a major source of newly emerging transcripts during evolution, especially when they involve sequences derived from transposable elements. They are therefore under precise regulation and quality control, which minimizes their potential to disrupt gene expression. We explain how non-canonical splicing can lead to aberrant transcripts that cause many diseases, and also how it can be exploited for new therapeutic strategies.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Cryptic exons and microexons.
a) Many introns contain proximally spaced sequences that resemble splice sites, which can in some cases lead to splicing of ‘cryptic’ exons. Cryptic exons often introduce premature termination codons (PTCs), which may target the resulting transcripts for nonsense-mediated decay (NMD). Such NMD-exons are common within transcripts that encode splicing activators, where they function as part of autoregulatory mechanisms–. In this example, the SR protein enhances inclusion of an NMD-exon within its own mRNA as part of a negative autoregulatory feedback that maintains appropriate steady-state abundance. b) An Alu element is normally composed of two arms, which contain an A-linker and polyA tail. The Alu can become retrotransposed into the antisense strand relative to the gene, so that transcription of the gene produces antisense Alu sequence that contains two U-tracts at the beginning of each arm. Many such antisense Alu elements are capable of forming cryptic exons owing to the presence of splice site-like motifs. However, they are normally repressed by a hnRNP C tetramer (green circle), possibly because each U-tract can bind the two RNA Recognition Motif domains that are present on the opposite surfaces of the tetramer (as indicated by the green arrow),. The example provided here shows the U-tracts around the Alu exon from the CD55 gene (encoding CD55 molecule). Below, mutations in the U-tracts are shown that decrease binding of hnRNP C, allowing binding of U2 small nuclear RNA auxillary factor (U2AF2) and TIA1 cytotoxic granule-associated RNA binding protein (TIA1), which initiate splicing of a cryptic Alu exon,,. C = hnRNP C protein. C) Microexons can be detected from gapped regions in sequencing reads,,. After mapping of multiple parts of the sequence read to flanking exons, unmapped intervening sequences are aligned to the intronic sequence present between the two exons, with preference given to those that are flanked by conserved splice site motifs. Inclusion of microexons can be enhanced by RNA binding proteins (RBPs) such as Serine/Arginine Repetitive Matrix 4 (SRRM4), an SR protein that binds upstream of microexons and promotes microexon splicing. Inclusion of microexons typically leads to modulation of overlapping or adjacent protein domains to change protein activity. SRRM4 is reduced in autism patients leading to decreased inclusion of microexons. YAG, 3' splice site; GU, 5' splice site; NMD, Nonsense-mediated decay; μ?, possible microexon; μ, microexon.
Figure 2
Figure 2. Recursive splicing of long introns.
a) Total RNA-seq read counts display a characteristic pattern of depletion from the start to the ends of long introns, which can be used to infer exon positions and splicing events,. “Sawtooth” patterns that overlap novel junction reads indicate splicing at deep intronic loci and are candidates for recursive splicing,. Here, the upstream exon first uses a 3' splice site to remove the first part of the intron. This process reconstitutes a 5' splice site that can then be used to remove the next section of the intron. This special type of splice site that is shown in the weblogo is referred to as a recursive splicing site (RS site). b) Recursive splicing in vertebrates requires the RS site to overlap a cryptic ‘RS exon’, which initiates the exon definition mechanism, required for recognition of the 3' splice site of the RS site. After the first splicing step, the 5' splice site of the RS site competes with the 5' splice site of the RS exon. In the second step of splicing, the outcome of this competition decides whether the RS exon is skipped owing to recursive splicing, or included as an NMD-exon. While the preceding exons from major isoforms end in sequences that favour RS exon skipping, the minor isoforms and cryptic elements end in sequences that favour RS exon inclusion. RS site, Recursive splice site; RS exon, Recursive splicing exon; YAG, 3' splice site; GURAG, 5' splice site.
Figure 3
Figure 3. Intron retention and exitrons.
a) Intron retention events are detected as an accumulation of reads across intronic regions, or increases in the ratio of exon-intron reads to exon-exon reads–. Intron retention events are characterised by numerous features including weak splice sites, high GC content and short intron lengths. Trans-acting factors such as RBPs, the spliceosome and the EJC can also regulate specific intron retention events. The resulting transcripts are typically either retained in the nucleus or targeted for NMD in the cytoplasm or may result in truncated proteins,,. Other intron retention events might be translated into truncated proteins. b) Exitrons are introns within annotated protein-coding exons that can be removed owing to the presence of internal splice site motifs within the exon,. Exitron-containing exons are longer than typical exons, and removal of the exitron can lead to changes in protein structure or degradation via NMD. NMD, Nonsense-mediated decay; AG, 3' splice site; GU, 5' splice site.
Figure 4
Figure 4. Formation of circRNAs and chimeric transcripts.
a) CircRNAs are produced by head to tail splicing and can be both mono- or multi-exonic. In this multi-exonic example the 3' splice site of an upstream exon becomes spliced to the 5' splice site of a downstream exon to generate a circular transcript that that either has the intervening intron removed (exonic circRNA) or retained between the two circularized exons (intron-exon circRNA) . Their formation is promoted when the pre-mRNA regions flanking the exon termini are brought in proximity. This can be due to the action of RNA-binding proteins such as Quaking (QKI) or muscleblind-like (MBNL), which bind to flanking regions,. Alternatively, this can be due to RNA hybridisation of the flanking regions, which can be caused by Alu elements in primates. b) Circular RNAs are resistant to RNase R, which can be used for their enrichment during preparation of cDNA libraries. They can then be detected in sequencing data by junction reads that are in a head-to-tail orientation–. c) Chimeric RNA products can also be produced by cis-splicing when transcript termination is deficient. This process results in read-through of one gene into its neighbouring gene, before splicing occurs between the penultimate exon of gene 1 and the second exon of gene 2, which is seen in the CTSC-RAB38 genes in cancer. d) Trans-splicing occurs when exons of two different transcripts become spliced together–. Alternatively, the same chimeric transcripts can be produced when genes become fused at the level of the DNA, such as in JAZF1-SUZ12 genes in some cancer, which leads to the same chimeric transcript being produced by a linear splicing reaction. RBP, RNA-binding protein.
Figure 5
Figure 5. A summary of human splice site consensus motifs.
Summarised splice site sequences are classified using the nucleotides marked by the grey boxes. All borders of human exons within Ensembl v83 multi-exon transcripts that overlap with RefSeq mRNA IDs were used. Identical coordinates from overlapping transcripts were collapsed into a single occurrence such that junctions were not counted multiple times. First exons had only their exon-intron junction evaluated, whilst terminal exons had only their intron-exon junction evaluated. This led to a total of 189,255 5' splice sites (shown on the left, with the line showing exon-intron border) and 187,091 of 3' splice sites (shown on the right, with the line showing intron-exon border). U12-type splice site sequences were obtained from U12DB. After identifying the 5' and 3' sites overlapping with the U12-type splice sites, respectively, the remaining U2-type splice site sequences were examined. 5' and 3' splice sites were classified independently and sequentially based on the indicated nucleotides. For example, 53.58% of unique U1-type exon-intron junctions contain GTRAG, and the remaining U1-type junctions were classified based on the first two intronic nucleotides. The percentage of unique junctions containing each motif are indicated. Weblogo 3 was used to show the relative frequency of nucleotides at each position. a) The U1-type 5' splice sites with GT at the border, and U2-type 3' splice sites with AG at the border, b) The U11-type 5' splice sites and U12-type 3' splice sites, c) The U1-type 5' splice sites with GC at the border, remaining U1-type 5' splice sites with TN at the border, where N stands for any nucleotide, U1-type 5' splice sites with VN at the border, where V stands for any nucleotide except T, U2-type 3' splice sites with BG at the border, where B stands for any non-A nucleotide, the U2-type 3' splice sites with W at the border, where W stands for T or A.
Figure 6
Figure 6. Cryptic splicing in disease and therapeutic strategies.
a) Cryptic exons are normally repressed by RBPs such as hnRNPC (green circle) or by U1 snRNP. b) Examples of mutations (numbered) in deep intronic regions that can activate cryptic splicing events in disease-associated genes. (1) hnRNPC (green circle) binding to a U-tract upstream of an antisense Alu element represses recognition of the cryptic 3' splice site within the element. Intronic deletions or point mutations that shorten U-tract can impede hnRNPC recruitment but allow U2AF2 (shown in purple) binding, leading to Alu exonisation. A deletion within an Alu in the PTS gene (encoding 6-Pyruvoyltetrahydropterin Synthase) leads to splicing of an Alu exon that introduces a frameshift, thereby causing the neurologic disease hyperphenylalaninaemia,. (2) In the ATM gene, U1snRNP (orange circle) binding to an intronic element within a cryptic exon inhibits its recognition as a splicing competent exon. Patients with ataxia telangiectasia present a 4 nt deletion that abolishes U1snRNP interaction, causing cryptic exon activation. (3) A point mutation within a deep intronic sequence of the CFTR gene generates an active 5' splice site that allows insertion of a cryptic exon within the CFTR transcripts, which causes cystic fibrosis. (4) In the BRCA2 gene, a point mutation that disrupts a canonical 3' splice site activates (depicted by a grey arrow) an upstream cryptic exon. Disrupted BRCA2 expression causes breast, ovarian and other cancer types. c) New therapeutic strategies in cancer involve spliceosome targeting,,. In MYC-driven tumours, oncogenic MYC causes transcriptional amplification, which overloads the splicing machinery and makes these cells more sensitive to alterations in splicing fidelity. Genetic knockdown or pharmacological inhibition of spliceosomal components leads to accumulation of retained introns that results in increased apoptosis and reduced tumorigenic and metastatic potential of MYC-driven tumours. C, hnRNP C protein; U1, U1 snRNP; AF2, U2AF2 protein.

References

    1. Raj B, Blencowe BJ. Alternative Splicing in the Mammalian Nervous System: Recent Insights into Mechanisms and Functional Roles. Neuron. 2015;87:14–27. doi: 10.1016/j.neuron.2015.05.004. - DOI - PubMed
    1. Fu XD, Ares M., Jr Context-dependent control of alternative splicing by RNA-binding proteins. Nat Rev Genet. 2014;15:689–701. doi: 10.1038/nrg3778. - DOI - PMC - PubMed
    1. Derrien T, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome research. 2012;22:1775–1789. doi: 10.1101/gr.132159.111. - DOI - PMC - PubMed
    1. Matera AG, Wang Z. A day in the life of the spliceosome. Nat Rev Mol Cell Biol. 2014;15:108–121. doi: 10.1038/nrm3742. - DOI - PMC - PubMed
    1. Scotti MM, Swanson MS. RNA mis-splicing in disease. Nat Rev Genet. 2015 doi: 10.1038/nrg.2015.3. - DOI - PMC - PubMed