Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;42(16):10564-78.
doi: 10.1093/nar/gku744. Epub 2014 Aug 14.

A comprehensive survey of non-canonical splice sites in the human transcriptome

Affiliations

A comprehensive survey of non-canonical splice sites in the human transcriptome

Guillermo E Parada et al. Nucleic Acids Res. 2014.

Abstract

We uncovered the diversity of non-canonical splice sites at the human transcriptome using deep transcriptome profiling. We mapped a total of 3.7 billion human RNA-seq reads and developed a set of stringent filters to avoid false non-canonical splice site detections. We identified 184 splice sites with non-canonical dinucleotides and U2/U12-like consensus sequences. We selected 10 of the herein identified U2/U12-like non-canonical splice site events and successfully validated 9 of them via reverse transcriptase-polymerase chain reaction and Sanger sequencing. Analyses of the 184 U2/U12-like non-canonical splice sites indicate that 51% of them are not annotated in GENCODE. In addition, 28% of them are conserved in mouse and 76% are involved in alternative splicing events, some of them with tissue-specific alternative splicing patterns. Interestingly, our analysis identified some U2/U12-like non-canonical splice sites that are converted into canonical splice sites by RNA A-to-I editing. Moreover, the U2/U12-like non-canonical splice sites have a differential distribution of splicing regulatory sequences, which may contribute to their recognition and regulation. Our analysis provides a high-confidence group of U2/U12-like non-canonical splice sites, which exhibit distinctive features among the total human splice sites.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An overview of the workflow used for the search of non-canonical splice junctions. (A) Ab initio detection of non-canonical splice junctions. RNA-seq data of GM12878 were aligned to their diploid personalized genome and RNA-seq data from a mixture of 16 human tissues were aligned to the human reference genome (hg19) using MapSplice. All alignments of RNA-seq and cDNA/EST data were pre-processed in order to generate an initial library of splice junctions. Additional SNP/indel filters were applied to tissue RNA-seq and cDNA/EST alignment. (B) All the RNA-seq data were re-aligned to the library of splice junctions. Additional RNA-seq data from individual tissues were also directly aligned to the library. (C) A total of 1630 non-canonical introns were present in at least two sources of data. From these, 462 non-canonical splice junctions were detected in at least a coverage ratio of 1:20 compared with their most abundant splice variant.
Figure 2.
Figure 2.
Non-canonical splice site conservation across different vertebrates. The UCSC Genome Browser images show two examples of non-canonical splice sites that are conserved in several vertebrates. Based on our results, we made an annotation of human splice junctions (middle track). The splice junctions' name (ID) indicates their dinucleotides and read coverage (for details see the Materials and Methods section). Genome alignments from different vertebrates are shown. Dots indicate conservation of the human nucleotides; red and green rectangles indicate the conservation of canonical and non-canonical splice sites respectively; and * indicates that the splice site is supported by cDNA/EST alignments of each vertebrate. (A) The BANP gene has an alternative GG-3′ non-canonical splice site that is conserved among most of the vertebrates and it was derived from an ancient canonical splice site. (B) The AT–AG non-canonical splice site of ACTR10 has been recently derived in evolution from a canonical splice site. This non-canonical splice site is exclusive of some primates.
Figure 3.
Figure 3.
Non-canonical splice sites are highly involved in alternative splicing and some show tissue-specific alternative splicing patterns. (A) Participation of non-canonical splice sites in alternative splicing. (B) The use of a non-canonical splice site generates a frame shift and a premature termination codon that disrupts a NAP-like domain (highlighted in yellow) of the TSPYL2 protein. A tissue-specific pattern of 3′ alternative splice site selection is shown across seven human tissues. Coverage quantification is plotted, where error bars indicate the 95% binomial confidence interval. (C) CPSF3 gene has an alternative 5′-GG non-canonical splice site. Coverage quantification shows a testis-specific selection of the 5′-GG non-canonical splice site.
Figure 4.
Figure 4.
Non-canonical site junctions have distinctive distribution of EIE and IIE. (A) Positional density of EIE in canonical (blue) and 5′ non-canonical (pale blue) splice sites. (B) Positional density of IIE in canonical (red) and 5′ non-canonical (pale red) splice sites. (C) Positional density of EIE in canonical (blue) and 3′ non-canonical (pale blue) splice sites. (D) Positional density of IIE in canonical (red) and 3′ non-canonical (pale red) splice sites. The positional density is smoothed over a window of 10 bases.
Figure 5.
Figure 5.
Editing-dependent splicing of non-canonical splice sites. The adenine of the AA-3′ non-canonical splice site that is highlighted in yellow shows a consistent A>G mismatch in poly(A)-minus RNA-seq data from HUVEC cell line. This reflects an A-to-I editing event in the AA-3′ non-canonical splice site, which likely allows the splicing of the GT–AA intron and the exon inclusion of a cassette exon. The exon skipping event is not annotated in GENCODE v17.
Figure 6.
Figure 6.
Non-canonical splice sites are prone to be misannotated. (A) UCSC Genome Browser image shows a splice site area of ITPR1 where RNA-seq reads can be aligned in two ways. The difference between the two alignments relies on a TGAG sequence (yellow letters) that can be aligned with three mismatches, evidencing a canonical GT–AG splice junction (red alignment) or without mismatches, but evidencing a non-canonical GA–AG splice junction (green alignment). A GENCODE isoform is based on the suboptimal alignment (highlighted in red). Assembled transcript of ITPR1 based on Tophat RNA-seq alignments is based on the suboptimal alignment (highlighted in red). RT-PCR coupled to Sanger sequencing probed that this transcript does not have any mismatches with the genome. (B) ETV1 gene has a constitutive TT–AG non-canonical splice junction that is annotated in GENCODE, but Cufflinks cannot assemble a continuous transcript for ETV1 due to TopHat's inability to align non-canonical splice junctions.
Figure 7.
Figure 7.
Non-U2/U12 splice junctions have a higher number of direct repeats and %GC content. (A) Frequency distribution of the number of direct repeats at canonical, U2/U12-like non-canonical and non-U2/U12 non-canonical splice junctions. (B) Frequency distribution of %GC content in canonical, U2/U12-like non-canonical and non-U2/U12 splice junctions.
Figure 8.
Figure 8.
The non-U2/U12 splice junction of CCNG1 is a template switching artifact. (A) Our human splice junction annotation shows a non-U2/U12 splice junction (AG–CC[219]) present in the 5′ UTR of CCNG1 gene. Other non-U2/U12 splice junctions for this gene are annotated in GENCODE, but only the non-U2/U12 splice junction from our annotation (shown in red) was obtained by RT-PCR and Sanger sequencing. Red genomic letters indicate the 3-nt long direct repeat associated with this splice junction. (B) RT-PCR of CCNG1 transcripts using MMVL or AMV enzymes; alongside are represented the different products amplified. (C) In silico prediction of the secondary RNA structure associated with the CCNG1 non-U2/U12 splice junction. Directs repeats are highlighted in red.

References

    1. Hoskins A.A., Moore M.J. The spliceosome: a flexible, reversible macromolecular machine. Trends Biochem. Sci. 2012;37:179–188. - PMC - PubMed
    1. Will C.L., Luhrmann R. Spliceosome structure and function. Cold Spring Harb. Perspect. Biol. 2011;3 - PMC - PubMed
    1. Aebi M., Hornig H., Padgett R.A., Reiser J., Weissmann C. Sequence requirements for splicing of higher eukaryotic nuclear pre-mRNA. Cell. 1986;47:555–565. - PubMed
    1. Dietrich R.C., Fuller J.D., Padgett R.A. A mutational analysis of U12-dependent splice site dinucleotides. RNA. 2005;11:1430–1440. - PMC - PubMed
    1. Lamond A.I., Konarska M.M., Sharp P.A. A mutational analysis of spliceosome assembly: evidence for splice site collaboration during spliceosome formation. Genes Dev. 1987;1:532–543. - PubMed

Publication types

Substances