Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug;21(8):1360-74.
doi: 10.1101/gr.119628.110. Epub 2011 Jun 9.

Quantitative evaluation of all hexamers as exonic splicing elements

Affiliations

Quantitative evaluation of all hexamers as exonic splicing elements

Shengdong Ke et al. Genome Res. 2011 Aug.

Abstract

We describe a comprehensive quantitative measure of the splicing impact of a complete set of RNA 6-mer sequences by deep sequencing successfully spliced transcripts. All 4096 6-mers were substituted at five positions within two different internal exons in a 3-exon minigene, and millions of successfully spliced transcripts were sequenced after transfection of human cells. The results allowed the assignment of a relative splicing strength score to each mutant molecule. The effect of 6-mers on splicing often depended on their location; much of this context effect could be ascribed to the creation of different overlapping sequences at each site. Taking these overlaps into account, the splicing effect of each 6-mer could be quantified, and 6-mers could be designated as enhancers (ESEseqs) and silencers (ESSseqs), with an ESRseq score indicating their strength. Some 6-mers exhibited positional bias relative to the two splice sites. The distribution and conservation of these ESRseqs in and around human exons supported their classification. Predicted RNA secondary structure effects were also seen: Effective enhancers, silencers and 3' splice sites tend to be single stranded, and effective 5' splice sites tend to be double stranded. 6-mers that may form positive or negative synergy with another were also identified. Chromatin structure may also influence the splicing enhancement observed, as a good correspondence was found between splicing performance and the predicted nucleosome occupancy scores of 6-mers. This approach may prove of general use in defining nucleic acid regulatory motifs, substitute for functional SELEX in most cases, and provide insights about splicing mechanisms.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
High-throughput definition of pre-mRNA splicing signals from sequence space. (A) The architecture of a linear minigene library. This minigene contains the Wilm's tumor gene 1 exon 5 (WT1-5) as a central exon (yellow box) flanked by sequences from a Dhfr minigene (blue boxes, Dhfr exons). A random 6-mer library (gray box) is located from +5 to +10 in the central exon (termed location WA) and the detailed sequences of the central exon and its 3′ and 5′ splice sites are shown. (B) The scheme of the high-throughput in vivo functional selection of splicing motifs. The Enrichment Index (EI) for a particular 6-mer is defined as the output proportion of this motif divided by its input proportion. The numbers represent six exemplary cases. (C) Minigene library construction. See Methods for details. (D) The distribution of the 4096 6-mers in the DNA input (dashed black) and RNA output (red) sequences. (E) Duplicate PCR preparations and sequencings of the DNA input library designated by the CG and TA barcodes yield very similar compositions. The proportions of the 6-mers are presented as counts per million reads. (F) The compositions of 6-mers in the RNA output sequences from two independent transfections labeled with either the CG or the TA barcode. The proportions of the 6-mers are presented as counts per million reads. (G) The distribution of 6-mer log2EI values. The results represent the average of two transfections.
Figure 2.
Figure 2.
Examples of the effect of location on the splicing behavior of substituted pre-mRNA molecules. (A) The five locations in exons WT1-5 (top, 51 nt) and Hb2 (bottom, 223 nt). (B) The best correlation of the effect of 6-mer substitutions (LEIsc, log of the enrichment index, scaled) was between the WA and WD locations (R2 = 0.34). (C) The worst correlation was between WD and HA (R2 = 3 × 10−5). The dashed line is the least squares regression fit. The results for all five locations are shown in Supplemental Figure 1.
Figure 3.
Figure 3.
Experimental validation of the effect of substituting individual 6-mer sequences. The observed percent inclusion resulting from testing 14 6-mers at the WA location (y-axis) agreed well with those calculated from the digital data (x-axis) as described in Methods. Tested 6-mers were chosen from the entire range of EI values observed. After transfection into HEK293 cells, RT–PCR products in ethidium bromide-stained gels were quantified by ImageJ. Error bars show the range from duplicate transfections. Observed inclusion is 100 × included/(included + skipped). The observed percent inclusion of 36 additional 6-mers from the WD, HA, HM, and HD locations also agreed well with those calculated from the digital data (Supplemental Fig. 3A–D). The locations are depicted schematically in Figure 2A.
Figure 4.
Figure 4.
Contexts created by overlapping 6-mers. (A) An example of how a substitution (GACGTC) creates 11 overlapping WA 6-mers spanning a 16-nt region. The 11 variable 6-mers are distinct for each location and all are assumed capable of influencing splicing. To take the overlapping 6-mers into account, each was assigned the value 1.033, representing the LEIsc value observed for this pre-mRNA molecule. Library bases are underlined. (B) An example of one 6-mer (GACGTC) and the LEIsc values of all the molecules from all locations that contain it in the 16-nt overlap region. (C) Assignment examples for an ESEseq (GACGTC), an ESSseq (CCAGCA), and a neutral 6-mer (AAAGAG). The 20,480 total molecules were classified into two categories: those in which a 6-mer was absent (−) or present (+). Average LEIsc values for each category are shown. Splicing enhancers (ESEseqs) are defined as 6-mers, for which the average LEIsc value is significantly higher when present and its ESEseq score is the difference between the average LEIsc values of the two categories. Splicing silencers (ESSseqs) are defined as 6-mers, for which the average LEIsc value is significantly lower when present. Neutral 6-mers are defined as 6-mers, for which the two values are not significantly different. P-values are from a t-test. The error bar is SEM. (D) Comparison of the observed LEIsc value of a library pre-mRNA molecule with the splicing strength predicted from the additive model described in the text. The chart contains 20,480 points.
Figure 5.
Figure 5.
Genomic characteristics of ESEseqs and ESSseqs. (A) Frequency of the top 400 ESEseqs in human constitutive (119,006, black), alternative cassette (25,807, red), and pseudo exons (134,994, gray), all >50 nt, and their flanking introns. The frequency of the top 400 6-mers per exon at each nucleotide position is shown on the y-axis. The black bars on the x-axis represent a composite exon comprising 50 nt downstream from the 3′ splice site abutted to 50 nt upstream of the 5′ splice site. Thin lines indicate intronic flanks. Positions overlapping the 3′ and 5′ splice sites (−14 to +1 and −3 to +6) were excluded. (B) Frequency of the bottom 400 ESSseqs, presented as in A. (C) ESEseqs are more highly enriched in constitutive exons than are the top performing 6-mers from any individual location. (T) Top; (**) P < 10−1362 test). (D) ESSseqs are more highly enriched in intronic regions (note the reversal of the ratio on the y-axis) than are the bottom performing 6-mers from any individual location. (B) Bottom; (**) P < 10−1402 test). (E, left) ESEseqs are conserved in macaque-human evolution and ESSseqs are not. (Right) SNP density is lower in ESEseqs and higher in ESSseqs. Only non-CpG containing ESRseqs and ESSseqs (filled bars) were used; the controls (open bars) were scrambled non-CpG-containing versions of the ESRseqs. (**) P < 10−1402 test). Analyses that included CpG-containing 6-mers yielded similar results (Supplemental Fig. 5A,B). Error bars, SEM. (F) Distribution of average ESRseq scores in and around human constitutive (black), alternative cassette (red), and pseudo exons (gray).
Figure 6.
Figure 6.
Detection of 6-mers exhibiting positional bias. (A) Scatter plot of 6-mer LEIscs in the HA and HD contexts. Six-mers eliciting LEIsc values that were significantly (FDR = 1%) higher in the HA context are blue and those significantly higher in the HD context are red. (B) HA context preferred motifs (blue in A) are more highly enriched in the exonic region closer to the 3′SS in human constitutive exons. The average 6-mer density in the four regions (−100 nt to −51 nt in the upstream intron, +2 nt to +50 nt the exon body, −50 nt to −4 nt in the exon body, and +51 nt to +100 in the downstream intron) was set equal to one and other values adjusted accordingly. (C) HD context preferred motifs (red in A) are more highly enriched in the exonic region closer to the 5′SS. The data are presented as in B. (D) HD context preferred motifs resembling 9G8 binding sites are more highly enriched in the exonic region closer to the 5′SS in human constitutive exons. (E) HD context preferred motifs resembling PTB binding sites are less depleted in the exonic region closer to the 5′SS. (*) P < 3 × 10−13; (**) P < 2 × 10−16; (***) P < 3 × 10−40 (t-test). Error bars, SEM.
Figure 7.
Figure 7.
Secondary structure effects of 6-mer substitution. (A) Maps for B and C. (B) Effective ESEseqs tend to be single stranded. Single strandedness of ESEseqs was measured by the probability of being unpaired (PU) (Hiller et al. 2007). (Filled bars) All ESEseqs in the 16-nt region of the top scoring 400 transcripts; (open bars) all ESEseqs in the 16-nt region drawn from the middle scoring 1000 transcripts, as G+C–matched controls. The mean PU of each control was set to unity. (C) Effective ESSseqs tend to be single stranded. As in B except that filled bars show all ESSseqs in the 16-nt region of the bottom scoring 400 transcripts. (D) Maps for E and F. (E) The 3′ splice site (SS) tends to be single stranded in high-scoring transcripts. This analysis was restricted to locations WA and HA, where the substitution is close enough to the 3′SS (−14/+1) to influence local folding. T400 (set1): comparison of highly spliced transcripts with controls (set2) as in B. B400 (set1): comparison of poorly spliced transcripts with controls (set2) as in C. Bars on right: average PU of the 3′SS from constitutive exons (filled) and G+C–matched pseudo exons (open). (F) The 5′SS tends to be double stranded in high-scoring transcripts. This analysis was restricted to location WD, which is close enough to the 5′SS (−3/+6) to influence local folding. Data is presented as in E. Bars on right: average PU of the 5′SS from constitutive exons (filled) and G+C–matched pseudo exons (open). (*) P < 5 × 10−2; (**) P < 10−2; (***) P < 10−3; (****) P < 10−4 (t-test). Error bars, SEM. See Methods for details.
Figure 8.
Figure 8.
Six-mers that are candidates for combinatorial requirements. The hypothesis here is that the target 6-mer is influenced by a partner sequence within the 16-nt summed region, leading to a deviation from the additive model. (A) One example of a 6-mer (AGAAGA) that may have positive synergy with another within the 16-nt summed region. In the case of positive synergy, the observed splicing strength (LEIsc) would be significantly higher than that predicted whenever AGAAGA is present in the 16-nt region. The predicted LEIscs were converted from the splicing strength predicted by the additive model shown in Figure 4D by linearly scaling the values to fit the scale of the observed LEIscs. The total 20,480 molecules were classified into two categories: those in which this 6-mer was absent and those in which it was present. The average of (observed LEIsc—predicted LEIsc) in each category is shown. P-values were calculated using a t-test. Error bars are the SEM. (B) One example of a 6-mer (TCCCTC) that may have negative synergy with another within the 16-nt summed region. In this case, the observed splicing strength (LEIsc) would be significantly lower than that predicted whenever TCCCTC is present. P-values were calculated using a t-test. Error bars are the SEM. (C) Clusters of 6-mers that may have positive synergy with others and resemble the binding sites of known splicing factors. (D) Clusters of 6-mers that may have negative synergy with others and resemble the binding sites of known splicing factors.
Figure 9.
Figure 9.
LEIscs are positively correlated with their predicted nucleosome occupancy scores. The full set of 4096 6-mers were divided into 64 groups of 64 ranked by their LEIscs values at a particular location (the first group of 6-mers represents those with the highest LEIscs). Nucleosome positioning scores of 6-mers were extracted from the data as measured by sequencing 150-mers described by Kaplan et al. (2009) and found at http://genie.weizmann.ac.il/pubs/nucleosomes08/nucleosomes08_data.html. The average nucleosome occupancy score of the 64 6-mers in each set was used for each bin. Pearson's correlation coefficient and the P-value (F test) were calculated from the unbinned data and are shown for each indicated location. Note that a rank of 1 represents the highest LEIsc value.(A) WA location; (B) WD location; (C) HA location; (D) HM location; (E) HD location.

References

    1. Andersson R, Enroth S, Rada-Iglesias A, Wadelius C, Komorowski J 2009. Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res 19: 1732–1741 - PMC - PubMed
    1. Aznarez I, Barash Y, Shai O, He D, Zielenski J, Tsui LC, Parkinson J, Frey BJ, Rommens JM, Blencowe BJ 2008. A systematic analysis of intronic sequences downstream of 5′ splice sites reveals a widespread role for U-rich motifs and TIA1/TIAL1 proteins in alternative splicing regulation. Genome Res 18: 1247–1258 - PMC - PubMed
    1. Benjamini Y, Hochberg Y 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57: 289–300
    1. Blanchette M, Green RE, Brenner SE, Rio DC 2005. Global analysis of positive and negative pre-mRNA splicing regulators in Drosophila. Genes Dev 19: 1306–1314 - PMC - PubMed
    1. Blanchette M, Green RE, MacArthur S, Brooks AN, Brenner SE, Eisen MB, Rio DC 2009. Genome-wide analysis of alternative pre-mRNA splicing and RNA-binding specificities of the Drosophila hnRNP A/B family members. Mol Cell 33: 438–449 - PMC - PubMed

Publication types

LinkOut - more resources