Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Jun 1;18(11):1241-50.
doi: 10.1101/gad.1195304. Epub 2004 May 14.

Computational definition of sequence motifs governing constitutive exon splicing

Affiliations
Comparative Study

Computational definition of sequence motifs governing constitutive exon splicing

Xiang H-F Zhang et al. Genes Dev. .

Abstract

We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' untranslated regions (5' untranslated regions [UTRs]) of transcripts of intronless genes. This type of comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers. Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined here. The large number of effective sequences implied by these results suggests that sequences that influence splicing may be very abundant in pre-mRNA.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A scatter plot showing the scores of all possible 65,536 8-mers with respect to their relative abundance in three sequence classes. The axis numbers represent z-scores. Z-scores on the X-axis are from a comparison of the relative abundance of each 8-mer in noncoding internal exons versus pseudo exons; this number is called an EP index when it is >0 (for enhancer compared with pseudo exons) and an SP index when it is <0 (for silencer compared with pseudo exons). The z-scores on the Y-axis are from a comparison of the relative abundance of each 8-mer in noncoding internal exons versus the 5′ UTR of intronless genes; this number is called an EI index when it is >0 (for enhancer compared with intronless genes) and an SI index when it is <0 (for silencer compared with intronless genes). In all further discussion, the silencer indices SP and SI are expressed as their absolute values. The dotted line marks a z-score of 2.88, chosen as a threshold for z-scores considered to be of significance. A z-score >2.88 has a probability of <0.002 of occurring by chance. Points lying beyond this threshold in both comparisons are black and represent the set of putative exonic splicing enhancers or silencers characterized further. If all 8-mers were distributed equally in all data sets, then the probability that a point will lie outside the dashed lines (i.e., in both dimensions) by chance is <10-4.
Figure 2.
Figure 2.
Examples of putative exonic splicing silencer (PESS; left) and putative exonic splicing enhancer (PESE; right) sequence families. The 974 PESS and 2069 PESE 8-mer sequences were aligned and then clustered using ClustalW. Pictograms for 10 PESSs and 8 PESEs on the basis of the positional sequence scoring matrix underlying each cluster are shown. The number of sequences in the cluster is shown in parentheses and the name of an exact exemplar used for testing is given, as is the information content in bits.
Figure 3.
Figure 3.
Minigenes used for testing effects on splicing. (A) Two versions of the exon 8 region of the human conserved helix-loop-helix ubiquitous kinase (CHUK) gene were inserted into a chimeric intron separating exon 1 and the combined exons 4-6 of the hamster dihydrofolate reductase (dhfr) gene. Large boxes depict exons, stubby gray boxes show flanking regions of the CHUK exon 8, and thin horizontal lines represent hamster intron sequence. In the upper figure, the CHUK exon 8 flanks are limited to the splice site consensus region from -14 upstream and to +7 downstream of the exon-intron junction; CHUK exon 8 is spliced poorly, as indicated. In the lower figure, additional CHUK exon 8 flanking sequences have been added from -62 upstream and to +75 downstream. The addition of this flanking sequence greatly improves exon inclusion, as shown. (B) Sequence of the 108-nt CHUK exon 8. PESE sequences are underlined and PESS sequences are double overlined. The BamHI site used for the insertion of tested 8-mers is in bold. (C) Sequence of the 108-nt thrombospon4 (Tbsh4) exon 13, annotated as in B.
Figure 4.
Figure 4.
The effect of 8-mer insertions on splicing. (A) Testing PESSs for splicing inhibition. The indicated 8-mer PESS sequences were inserted into a BamHI site at position +22 in CHUK exon 8 using the lower construct shown in Figure 3A. Plasmids were transfected into human 293 cells by lipofection, and RNA was extracted after 24 h and assayed for splicing by RT-PCR using radioactive dATP as a precursor. Band intensity was quantified with a PhosphorImager; proportion skipped indicates skipped band/(skipped band + included band). The bands correspond to the column below them and show the results of one transfection experiment; the graph shows the average of two transfections, and the error bars indicate the range. Black bars represent insertion of the PESS shown at the top (S), gray bars represent insertion of a single base substitution mutant sequence also shown at the top (M); the SP and SI scoring indices (defined in the legend for Fig. 1) of each PESS and each mutant sequence are shown at the bottom. (B) Testing PESEs for splicing enhancement. The indicated PESE 8-mers were inserted into CHUK exon 8 using the upper construct shown in A. Splicing was assayed exactly as in B. (E) PESE; (M) single base substitution mutant sequence. Proportion included indicates included band/(skipped band + included band). Two transfections were carried out for each construct; the error bars indicate the range of the two measurements. (C) The effect of insert sequence variations on splicing silencing. (Left) Multiple copies of a PESS can act synergistically to inhibit splicing: one, two, and three copies of the 8-mer PS9 (see A) were inserted into CHUK exon 8 and assayed for splicing as described in A. (Right) A double base substitution is more effective than a single base substitution in destroying silencing activity: The original PESS P5 and mutants harboring one or two single base substitutions were inserted into CHUK exon 8 and assayed for splicing as in A. The 8-mers were UGUAAUGU, UGUAAAGU, and UGGAAAGU, respectively; the SP indices were 4.92, 1.95, and 1.74, respectively; and the SIs were 3.14, 0.50, and -4.07, respectively. (D) Testing PESSs for silencing in a second exon. A minigene analogous to that shown in Figure 3A was constructed using human thrombospondin 4 exon 13 as the central exon. Eight PESS sequences were inserted into a BamHI site at a position 16 nt upstream of the 3′ end of the exon (Fig. 3C) and tested for silencing as described in A.
Figure 5.
Figure 5.
Statistical analysis of PESSs and PESEs in coding exons and introns. The frequencies of each of the 974 PESSs and 2069 PESEs were determined for each position in 78,000 human coding exons (50-250 nt long) and in 100 nt of their immediate flanks and in 100 nt regions from the center of 148,000 introns. Numbers on the ordinate indicate the average frequency of a PESS or PESE per nucleotide position multiplied by 100,000. The heavy gray curve represents the PESSs, and the black curve the PESEs. Indications below the curve: The box marked Real represents a composite exon standardized to 100 nt as described in the text and in Supplemen tal Material; their intronic flanks are indicated by heavy lines. The thin lines refer to intronic sequences of 100 nt extracted from the center of each intron; this same central intron data is presented three times for easy reference. The box marked Pseudo shows the same analysis performed on 20,580 pseudo exons drawn from repeat-free regions of introns; this set of pseudo introns did not overlap with the pseudo exon set used to derive the z-scores in Figure 1. The broken horizontal line depicts the average frequency of any given 8-mer in a random sequence (1/65,536).

References

    1. Berget, S.M. 1995. Exon recognition in vertebrate splicing. J Biol Chem. 270: 2411-2414. - PubMed
    1. Black, D.L. 2003. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 72: 291-336. - PubMed
    1. Blencowe, B.J. 2000. Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci. 25: 106-110. - PubMed
    1. Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 268: 78-94. - PubMed
    1. Burge, C.B., Tuschl, T., and Sharp, P.A. 1999. Splicing of precursors to mRNAs by the spliceosomes. In The RNA world, 2nd ed. (ed. R.F. Gesteland, Cech, T. R. & Atkins, J. F.), pp. 525-560. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

Publication types