Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Feb 22;102(8):2850-5.
doi: 10.1073/pnas.0409742102. Epub 2005 Feb 11.

Identification and analysis of alternative splicing events conserved in human and mouse

Affiliations

Identification and analysis of alternative splicing events conserved in human and mouse

Gene W Yeo et al. Proc Natl Acad Sci U S A. .

Abstract

Alternative pre-mRNA splicing affects a majority of human genes and plays important roles in development and disease. Alternative splicing (AS) events conserved since the divergence of human and mouse are likely of primary biological importance, but relatively few of such events are known. Here we describe sequence features that distinguish exons subject to evolutionarily conserved AS, which we call alternative conserved exons (ACEs), from other orthologous human/mouse exons and integrate these features into an exon classification algorithm, acescan. Genome-wide analysis of annotated orthologous human-mouse exon pairs identified approximately 2,000 predicted ACEs. Alternative splicing was verified in both human and mouse tissues by using an RT-PCR-sequencing protocol for 21 of 30 (70%) predicted ACEs tested, supporting the validity of a majority of acescan predictions. By contrast, AS was observed in mouse tissues for only 2 of 15 (13%) tested exons that had EST or cDNA evidence of AS in human but were not predicted ACEs, and AS was never observed for 11 negative control exons in human or mouse tissues. Predicted ACEs were much more likely to preserve the reading frame and less likely to disrupt protein domains than other AS events and were enriched in genes expressed in the brain and in genes involved in transcriptional regulation, RNA processing, and development. Our results also imply that the vast majority of AS events represented in the human EST database are not conserved in mouse.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic overview of the learning and prediction stages of the acescan procedure. (A) Learning. Sequence features that differed between sets SH,M and Sh,m were identified as described (Supporting Text). Random subsets of SH,M and Sh,m were used to train the acescan algorithm, and cross-validation scores were calculated for the unseen subsets of SH,M and Sh,m. The cross-validated acescan score distributions for SH,M (red) and Sh,m (black) are shown. (B) Prediction. Spliced alignments of transcript sequences were used to assign ensembl-annotated exons from ≈10,000 human–mouse orthologous gene pairs (not necessarily alternatively spliced) to one of two sets: SH,m and Sh,m. acescan score distributions for SH,m (pink) and Sh,m (blue) are shown.
Fig. 2.
Fig. 2.
Sequence features that differ between conserved alternative and constitutive human–mouse exons. (A) Features typical of exons of the SH,M (alternatively spliced) and Sh,m (constitutive) training sets are depicted. SH,M exons had shorter median exon length (93 versus 126 bases, P < 10–22), longer upstream intron length (P < 0.005), longer downstream intron length (P < 10–5), weaker 5′ and 3′ splice site scores [P < 10–5 and P < 0.02, respectively; maxentscan (http://genes.mit.edu)], higher exon sequence conservation (percent identity; P < 10–46), and higher conservation (clustal w alignment score) in the 150-base intron regions immediately upstream and downstream of the exon (P < 10–63 and P < 10–66, respectively). For each feature, the Kolmogorov–Smirnov test was used to test the null hypothesis of independent samples drawn from the same underlying population. Length and splice site score values are shown for human exons/introns; mouse values were similar. Average percent identity for alignments of flanking intron regions are shown in a 9-base sliding window for SH,M (red trace) and Sh,m (black dashed trace) exons. (B) Pentanucleotides used by acescan. Overrepresented (red) and underrepresented (black) pentamers in exon or 150-base flanking intron regions of SH,M versus Sh,m exons. Pentamer frequencies were analyzed separately for clustal w-aligned regions only (aligned) or the entire region (unaligned). Exon 5′ and 3′ ends refer to the first and last 100 bases of exon, respectively. Boxed oligonucleotides indicate overlap with ESS hexamers (20), and oligonucleotides with asterisks indicate overlap with RESCUE-ESE hexamers (19).
Fig. 3.
Fig. 3.
Validation and analysis of acescan[+] predictions. (A) Experimental validation by means of RT-PCR and sequencing of subsets of candidate acescan[+] exons and negative control acescan[–] exons in panels of normal human and mouse tissues with primers in flanking exons. Graphical representations of splicing patterns (inclusion/exclusion) and the number of exon pairs observed to be excluded and included are designated in red and black, respectively. The three randomly selected subsets tested were (i)30 acescan[+] exon pairs; (ii) as negative controls, 15 acescan[–]SH exon pairs (with EST/cDNA evidence for inclusion and exclusion of the human exon indicated by horizontal lines representing spliced transcripts); and (iii)11 acescan[–]Sh exon pairs (with no transcript evidence for skipping in either human or mouse). (B) SNP density in acescan[+], acescan[–]SH, and acescan[–]Sh exons. The number of stringently filtered SNPs per 10,000 bases was computed for each exon set. (C) Fraction of SH,M exons, acescan[+]SH exons, and acescan[–]SH exons that had lengths that were multiples of three and the background fraction of frame-preserving constitutive exons. (D) Analysis of protein domain preservation of acescan[+], acescan[–]SH, and acescan[–]Sh exons that maintain reading frame (i.e., length divisible by three). Maximum exon size cutoffs (150, 110, and 108 bases for acescan[+], acescan[–]SH, and acescan[–]Sh exons, respectively) were used to avoid exon length biases. The median length of exons in each subset was 84 bases, with no significant difference in the distribution of sizes among the sets (by a Kruskal–Wallis nonparametric test). The minimum number of exonic bases overlapping the protein domain was set to 30 bases. (E) GO “molecular function” and “biological process” categories, which differed significantly (P < 0.05), in the representation between genes containing predicted ACEs (black bars) and genes not containing predicted ACEs (white bars) are shown. Statistical significance was assessed by using χ2 statistics with Bonferroni correction for multiple hypothesis testing. GO categories are ordered from right to left in order of increasingly significant bias toward genes containing predicted ACEs. Only one category (transport) was significantly biased toward genes without predicted ACEs.
Fig. 4.
Fig. 4.
acescan scores for internal exons of well known alternatively spliced genes. Known alternative exons are indicated by asterisks; the known RNA edited exon of GLUR-B is indicated by the letter E. The following known AS exons are illustrated: exons 7 (168 bases), 8 (57 bases) and 15 (54 bases) of the human β-amyloid precursor protein precursor gene (APP, ensembl Gene ID ENSG00000142192) (A) and exons 14 (115 bases) and 15 (249 bases) of the human glutamate receptor, α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid 2 gene (GLUR-B, ensembl Gene ID ENSG00000120251) (B).

Similar articles

Cited by

References

    1. Black, D. L. & Grabowski, P. J. (2003) Prog. Mol. Subcell. Biol. 31, 187–216. - PubMed
    1. Maniatis, T. & Tasic, B. (2002) Nature 418, 236–243. - PubMed
    1. Lopez, A. J. (1998) Annu. Rev. Genet. 32, 279–305. - PubMed
    1. Black, D. L. (2003) Annu. Rev. Biochem. 72, 291–336. - PubMed
    1. Black, D. L. (2000) Cell 103, 367–370. - PubMed

Publication types