Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Apr 30:9:202.
doi: 10.1186/1471-2164-9-202.

Comparative analysis of sequence features involved in the recognition of tandem splice sites

Affiliations
Comparative Study

Comparative analysis of sequence features involved in the recognition of tandem splice sites

Ralf Bortfeldt et al. BMC Genomics. .

Abstract

Background: The splicing of pre-mRNAs is conspicuously often variable and produces multiple alternatively spliced (AS) isoforms that encode different messages from one gene locus. Computational studies uncovered a class of highly similar isoforms, which were related to tandem 5'-splice sites (5'ss) and 3'-splice sites (3'ss), yet with very sparse anecdotal evidence in experimental studies. To compare the types and levels of alternative tandem splice site exons occurring in different human organ systems and cell types, and to study known sequence features involved in the recognition and distinction of neighboring splice sites, we performed large-scale, stringent alignments of cDNA sequences and ESTs to the human and mouse genomes, followed by experimental validation.

Results: We analyzed alternative 5'ss exons (A5Es) and alternative 3'ss exons (A3Es), derived from transcript sequences that were aligned to assembled genome sequences to infer patterns of AS occurring in several thousands of genes. Comparing the levels of overlapping (tandem) and non-overlapping (competitive) A5Es and A3Es, a clear preference of isoforms was seen for tandem acceptors and donors, with four nucleotides and three to six nucleotides long exon extensions, respectively. A subset of inferred A5E tandem exons was selected and experimentally validated. With the focus on A5Es, we investigated their transcript coverage, sequence conservation and base-paring to U1 snRNA, proximal and distal splice site classification, candidate motifs for cis-regulatory activity, and compared A5Es with A3Es, constitutive and pseudo-exons, in H. sapiens and M. musculus. The results reveal a small but authentic enriched set of tandem splice site preference, with specific distances between proximal and distal 5'ss (3'ss), which showed a marked dichotomy between the levels of in- and out-of-frame splicing for A5Es and A3Es, respectively, identified a number of candidate NMD targets, and allowed a rough estimation of a number of undetected tandem donors based on splice site information.

Conclusion: This comparative study distinguishes tandem 5'ss and 3'ss, with three to six nucleotides long extensions, as having unusually high proportions of AS, experimentally validates tandem donors in a panel of different human tissues, highlights the dichotomy in the types of AS occurring at tandem splice sites, and elucidates that human alternative exons spliced at overlapping 5'ss posses features of typical splice variants that could well be beneficial for the cell.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Occurrence of extensions (E = 1,2,...,18 nucleotides) for A5Es (parts A, C) and A3Es (B, D), with human and mouse exons in the top and bottom panels, respectively. Extensions were inferred from three different alignment algorithms (colored as blue, SIM4; red, BLAT; and green, EXALIN) of cDNAs/ESTs to genomic DNA. The distribution f(E) for A5Es was markedly biased for extensions (E) with overlapping splice sites, with a peak at E = 4 nucleotides. Exon extensions exhibited relatively smaller but persistent periodic peaks at E = 6, 9, 12, 15, and 18 nucleotides. f(E) for A3Es also displayed a bias for overlapping splice sites, with a peak at E = 3 nucleotides and smaller peaks at 4–6 nucleotides. The program SIM4 predicted significantly more extensions at E = 4 nucleotides as compared to BLAT and EXALIN predictions of the same initial set of cDNAs/ESTs, which was indicative of spurious alignments. A comparative analysis of alternative exons in M. musculus corroborated the above patterns.
Figure 2
Figure 2
Illustrative examples of inferred tandem donors. White boxes denote exon and lines intron nucleotides; exon numbers (E#) corresponded to 5'-to-3' enumerated REFSEQ-annotations, the splice site score as measured by MAXENTSCAN, and the transcript coverage of the proximal and distal donor site corresponded to the number of aligned sequences. In A), E8 of the RAD9A gene shows a tandem donor with extension/GCAG/; in B) E9 of the ACAD9 gene shows a tandem donor with extension/GTAG/; in C), E15 of the SFRS16 gene shows a tandem donor with extension/GTCA/. Tandem donors in A) and C) were preferentially included in different transcripts. The conservation plot (PHASTCON scores, not in scale with the stated exon and intron nucleotides) covers A5EΔ4 splicing exons, as well as adjacent introns and downstream exons, and shows alternating patterns of high/low levels across all three examples.
Figure 3
Figure 3
Experimental validation of a tandem donor activated in E15 of the SFRS16 gene using RT-PCR and direct sequencing. The top shows the gene structure of SFRS16; in the middle and bottom, E14-16 are schematically extracted and the 3'-end core and full extension sequence of E15 for proximal (TCA/gtaaga) and distal (AAA/gtcagt) splicing are shown. Prior to reaching the 5'ss of E15, both mRNA isoforms cannot be distinguished and consequently the electropherogram displays, for each position, one nucleotide signal peak above the base line. After the tandem donor site, two nucleotide signals above the base line become visible, indicating the presence of two isoforms.
Figure 4
Figure 4
Scatter plot of the transcript coverage of competitive and tandem donors (A) and acceptors (B). Vertical and horizontal axes refer to the coverage of distal and proximal splice sites; solid and dotted lines mark the transcript means; A5EΔ4 and A3EΔ3 splicing exons are bolded, green and blue mark the ΔP and ΔD (major) splicing exons, respectively. The inset shows the histogram of the log-ratio (R) of the coverage of the distal over the proximal 5'ss (3'ss); curves marked in black show the smoothed distribution (splines, R package). In A) the coverage scatters mainly along the vertical or horizontal axis, which is indicative of preferentially including or excluding the exon extension from the core sequence. The coverage pattern was used to partition all A5Es into two main types, I and II, and a remaining type. The inset shows for the histogram of R a bimodal shape, which is indicative of two subpopulations of A5Es with predominant proximal or distal splice site usage. In B) the overlap between distal and proximal tandem acceptor coverage is comparatively broader, and consequently the histogram of R exhibits a unimodal shape consistent with a single population of A3Es.
Figure 5
Figure 5
Scatter plots of 5'ss scores of competitive and tandem donors (cf. notation of Figure 4). The upper panel shows the individual and mean scores (the latter is marked by solid/dashed lines); the lower panel compares on the left-hand side the cumulative score distribution of PΔ4 and dΔ4 splice sites with constitutive 5'ss and dΨ4 (pseudo distal 5'ss, in black), and on the right-hand side pΔ4 and DΔ4 splice variants with pΨ4 and 5'ss (pseudo proximal 5'ss, in black). The threshold at which the curves intersect (S*) marks the accuracy (A) at which sets can be distinguished with equal classification errors on major and minor splice variants. A(S*) ≈ 78% for PΔ4 versus dΔ4 (PΔ4/dΔ4) and A(S*) ≈ 92% for pΔ4/DΔ4, and A(S*) ≈ 95% for dΨ4/5'ss and A(S*) ≈ 99% for 5'ss/pΨ4. In the bottom, tables show the number of exons of each type above and below S*; ordered table entries are: TP, FP, TN, and FN (on white background).
Figure 6
Figure 6
Splice site signals and sequence conservation around splice sites. A) Pictograms of 5'ss and 3'ss of constitutive, PΔ4 and DΔ4, and A3EΔ3 splicing exons. The height of a nucleotide represents the frequency of occurrence at a given position, represented in the range of 14 nucleotides around the splice junctions. Above the constitutive 5'ss, the 3'-end of the U1 snRNA is indicated. B) Information score difference (ΔI) between PΔ4 and DΔ4, respectively, and constitutive splicing exons, as well as A3EΔ3 and constitutive splicing exons. For each position, ΔI > 0 (ΔI < 0), indicates more (lack of) information of an alternative compared to a constitutive splice site. C) Sequence conservation of human PΔ4 and DΔ4 splice sites and splice sites of exons of orthologous mouse genes, 'anchored' at major splice sites and with > 80% exon sequence identity.
Figure 7
Figure 7
Sequence conservation and splicing regulatory elements of A5EΔ4, A3EΔ3, and SEs of orthologous human and mouse genes. Upper panels A) and B) show for different AS types graphs of the mean exon conservation and of the mean conservation of exon-flanking sequences up to 100 nucleotides downstream, respectively. The conservation is shown individually for PΔ4 (panel A, green) and DΔ4 (panel B, blue) splicing exons; extension regions of A5EΔ4 splicing exons were excluded. Lower panels C) and D) show plots of occurrences of different splicing regulatory elements, located within the first 200 nucleotides of exon-flanking sequences that share > 80% exon identity and splice site signals with mouse exons.
Figure 8
Figure 8
Annotation of A5EΔ4 splicing exons in REFSEQ genes. Percentages refer to fractions of A5EΔ4 splicing exons located in the 5'-UTR, coding sequence (CDS) region, or 3'-UTR. A black-colored "s" indicates the position of the stop codon relative to the REFSEQ transcript structure, whereas the red-colored version indicates the altered stop codon due to tandem donor splicing. A5EΔ4 splicing exons embedded within CDS regions are broken down into two categories, depending on the creation of a premature (PTC) or delayed termination codon (DTC). PTCs can signal mRNAs as substrates for non-sense mediated decay.

Similar articles

Cited by

References

    1. Jurica MS, Moore MJ. Pre-mRNA splicing: awash in a sea of proteins. Mol Cell. 2003;12:5–14. doi: 10.1016/S1097-2765(03)00270-3. - DOI - PubMed
    1. Berget SM. Exon recognition in vertebrate splicing. J Biol Chem. 1995;270:2411–-2414. - PubMed
    1. Ladd AN, Cooper TA. Finding signals that regulate alternative splicing in the post-genomic era. Genome Biol. 2002;3:reviews0008. doi: 10.1186/gb-2002-3-11-reviews0008. - DOI - PMC - PubMed
    1. Graveley BR. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 2001;17:100–-107. doi: 10.1016/S0168-9525(00)02176-4. - DOI - PubMed
    1. Maquat LE. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol. 2004;5:89–-99. doi: 10.1038/nrm1310. - DOI - PubMed

Publication types