Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jun;15(6):768-79.
doi: 10.1101/gr.3217705.

Dichotomous splicing signals in exon flanks

Affiliations

Dichotomous splicing signals in exon flanks

Xiang H-F Zhang et al. Genome Res. 2005 Jun.

Abstract

Intronic elements flanking the splice-site consensus sequences are thought to play a role in pre-mRNA splicing. However, the generality of this role, the catalog of effective sequences, and the mechanisms involved are still lacking. Using molecular genetic tests, we first showed that the approximately 50-nt intronic flanking sequences of exons beyond the splice-site consensus are generally important for splicing. We then went on to characterize exon flank sequences on a genomic scale. The G+C content of flanks displayed a bimodal distribution reflecting an exaggeration of this base composition in flanks relative to the gene as a whole. We divided all exons into two classes according to their flank G+C content and used computational and statistical methods to define pentamers of high relative abundance and phylogenetic conservation in exon flanks. Upstream pentamers were often common to the two classes, whereas downstream pentamers were totally different. Upstream and downstream pentamers were often identical around low G+C exons, and in contrast, were often complementary around high G+C exons. In agreement with this complementarity, predicted base pairing was more frequent between the flanks of high G+C exons. Pseudo exons did not exhibit this behavior, but rather tended to form base pairs between flanks and exon bodies. We conclude that most exons require signals in their immediate flanks for efficient splicing. G+C content is a sequence feature correlated with many genetic and genomic attributes. We speculate that there may be different mechanisms for splice site recognition depending on G+C content.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Distribution around exons of positive and negative flanking pentamers characteristic of exons found by SVM analysis (Zhang et al. 2003). Raw frequencies of all 64 positive (black line) and 62 negative (gray line) pentamers are plotted against the positions relative to the exon. Upstream and downstream pentamers are plotted only in the corresponding flanks.
Figure 2.
Figure 2.
Effect of flanks on splicing. (A) Schematic diagram of the test construct used. PCR was used to insert the test exon with both, neither, or one flank of ∼50 nt beyond the splice-site consensus. (Thin lines) Introns; (large rectangles) exons; (small rectangles superimposed on lines) flanks. (B) Effect of inclusion of both flanks on splicing. Cloned plasmids were transfected into 293 cells and subjected to RT-PCR using incorporation of [32P]dATP and polyacrylamide gel electrophoresis; the intensity of the radioactive bands was quantified with a PhosphorImager. Markers are from an end-labeled 100-bp DNA ladder (Invitrogen). (chuk) Conserved helix-loop-helix ubiquitous kinase; (clcn7) chloride channel 7; (thbs4) thrombospondin 4; (wt1) Wilms tumor 1; (hbb) human β-globin; (clptm1) cleft lip and palate transmembrane protein 1; (dhfr-2-3) fused exons 2 and 3 of the Chinese hamster dhfr gene. The number after the hyphen denotes the exon number. An arrowhead indicates use of a cryptic donor splice site found by sequencing the PCR product to be gaa|gtaagt at +83 in the downstream dhfr intron 3; otherwise, the upper band position represents exon inclusion, and the lower band position represents exon skipping. The percent inclusion (radioactivity in the included band representing exon splicing divided by the total of all bands) is indicated below each panel. (C) Testing the effect of individual flanks in the case of 3 exons as in B above. (D) Splicing of the indicated flankless exons inserted at random locations within the 1200-nt intronic sequence of the test construct. In this experiment, each exon was bounded on each side by the same 25-nt bacterial sequence used for transposition. Cloned plasmids were transfected into 293 cells and analyzed by RT-PCR as in B. (Arrowheads) Splicing at unidentified cryptic sites. The seven insertion positions were at the following distances from the 5′ end of the intron: 143, 310, 321, 339, 674, 809, and 864, respectively. The insertion point for the experiments shown in B and C was 304, the natural end of intron 1 of the hamster dhfr gene.
Figure 3.
Figure 3.
GC% distributions of flanks. (A,B) GC% distributions of 50-nt flank regions upstream and downstream of exons, respectively. The solid curves show the GC% distribution of sequence windows at the indicated distances (black curves, -15 to -64; gray curves, -165 to -214) from the 3′ or 5′ end of exons. The dashed curve shows the GC% distribution of the closer 50-nt windows in the flanks of pseudo exons. (C) GC% distributions of averaged GC% of both closest 50-nt windows (i.e., -15 to -64 and +7 to +56) upstream and downstream of exons, compared with the same distribution of pseudo exons with or without repeats. (D) GC% distribution for real exon flanks as in C compared with their GT% and GA% distributions. (E,F) Flanks tend to exaggerate GC%. (E) GC% of flanks is plotted against GC% of the remainder of the introns. (F) The same as A, but with repeats masked.
Figure 4.
Figure 4.
Pentamer winners selected by comparing exon flanks to intron sequences with exactly the same GC% range and distributions. Winners were classified into four categories according to the GC% of their flanks (HGC or LGC) and location (upstream or downstream). Winners in italics have reverse complementary sequences as winners in the opposite flanks in the same class. Winners in bold are common to both upstream and downstream flanks in the same class. The underlined winners were also identified in a previous study (Zhang et al. 2003). The winners with asterisks overlap the branch-site consensus YTRAY.
Figure 5.
Figure 5.
Distributions of downstream winning pentamers (DW) in the flanks of HGC and LGC exons. The distribution around real exons (black curve) is compared with the distribution around pseudo exons (dark-gray curve) and after scrambling the flanks of real exons (light-gray curve). Total frequencies of each class of winners are plotted as percent. (Left) LGC winners; (right) HGC winners.
Figure 6.
Figure 6.
Secondary structure analysis. (A) Double strandedness in real exon flanks. Exons with their flanks were folded using Mfold. As a control, each upstream flank was scrambled and each downstream flank was scrambled, and the scrambled flanks were reconnected to the original exon body and then folded using Mfold. Both the original and scrambled versions of the sequences were divided into a HGC class (GC% of the most proximal 50-nt flanks >55) and an LGC class (GC% of the most proximal 50-nt flanks <45), leading to four different data sets as follows: original HGC (♦), original LGC (▴), scrambled HGC (⋄), and scrambled LGC (▵). We then plotted the double strandedness as a function of positions in flanks. Double strandedness reflects the frequency of all predicted base pairing at each position (see Results and Methods). (B) Double strandedness in pseudo exon flanks. Exactly the same analysis of pseudo exons as a control. (C) Flank-flank base pairing around HGC exons. The incremental contribution of each interflank base pair to the energy of each predicted stable structure was extracted from the Mfold output after folding both original exon plus flank sequences and again after scrambling the flanks. The difference between these two values (original sequence energies—scrambled sequence energies) is plotted as an indication of the excess secondary structure contributed at each position (filled symbols). A negative value represents more base pairing. For comparison, the same process was carried out on pseudo exons (open symbols). (D) Flank-exon base pairing around HGC exons. Differences in the free energy contributions of individual base pairs were calculated and displayed as in C, except only base pairs between flank and exon positions were chosen. (Real exons) Filled symbols; (pseudo exons) open symbols.
Figure 7.
Figure 7.
LGC and HGC winning pentamers residing in the flanks of the seven tested exons. (A-G) Exonic sequences are shown by uppercase letters and the splice site consensus sequences are in bold. Appropriate winners (i.e., of the cognate location and GC% class) are shaded; those that are inappropriate (of the same location, but opposite class) are underlined. Winners that overlap with exons or splice-site sequences (-14 to +1 and -3 to +6 relative to exon borders) are not shown. In G, the known branch point is indicated by an arrow and the consensus branch site sequence is italicized. Note the frequent overlap of winning pentamers in clusters.

Similar articles

Cited by

References

    1. Abdul-Manan, N., O'Malley, S.M., and Williams, K.R. 1996. Origins of binding specificity of the A1 heterogeneous nuclear ribonucleoprotein. Biochemistry 35: 3545-3554. - PubMed
    1. Adams, M.D., Rudner, D.Z., and Rio, D.C. 1996. Biochemistry and regulation of pre-mRNA splicing. Curr. Opin. Cell. Biol. 8: 331-339. - PubMed
    1. Amarasinghe, A.K., MacDiarmid, R., Adams, M.D., and Rio, D.C. 2001. An in vitro-selected RNA-binding site for the KH domain protein PSI acts as a splicing inhibitor element. RNA 7: 1239-1253. - PMC - PubMed
    1. Ast, G., Pavelitz, T., and Weiner, A.M. 2001. Sequences upstream of the branch site are required to form helix II between U2 and U6 snRNA in a trans-splicing reaction. Nucleic Acids Res. 29: 1741-1749. - PMC - PubMed
    1. Berget, S.M. 1995. Exon recognition in vertebrate splicing. J. Biol. Chem. 270: 2411-2414. - PubMed

Publication types