Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 9;6(12):e1001236.
doi: 10.1371/journal.pgen.1001236.

Noisy splicing drives mRNA isoform diversity in human cells

Affiliations

Noisy splicing drives mRNA isoform diversity in human cells

Joseph K Pickrell et al. PLoS Genet. .

Abstract

While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Extensive unannotated splicing in human cells.
A. We plot, as a function of number of supporting reads, the fraction of junctions 1) matching GT-AG, the splice site consensus sequences (black), 2) matching a control pair of dinucleotides (grey), 3) annotated in EST databases (light blue), or 4) annotated in gene databases (dark blue). B. We split all junctions into those that are annotated in gene model databases and those that are not. Plotted is the cumulative number of junctions of each type by expression level. Unannotated junctions are expressed at much lower levels than annotated junctions. C and D. Alternative splice junctions near known protein-coding junctions show a periodic pattern. At each alternatively-spliced protein-coding 3′ or 5′ splice site, we counted the positions of AG (or GT, respectively) dinucleotides used as alternative splice sites, then averaged this across splice sites (see Methods). The red points denote positions that are a multiple of three base pairs from the major splice form, and the black points those that are not. The blue box below each panel shows the position of the exon.
Figure 2
Figure 2. An example of splice junctions identified in a gene.
In the top panel, we plot the average expression level at each base in a region surrounding HERPUD1. In blue are bases annotated as exonic, and in black are those annotated as not exonic. In the middle panel, we plot the positions of all splice junctions in the region identified in our data. In black are splice junctions that are present in gene databases; in red are those that are not. The number of sequencing reads supporting each junction is written to the right of each junction, and junctions are ordered from top to bottom of the plot according to their coverage. In the bottom panel, we show the gene models in the region from Ensembl. The blue boxes show the positions of exons, and the black lines the positions of introns.
Figure 3
Figure 3. Unannotated splice junctions show little evidence of evolutionary conservation.
In each panel, we plot the mean phyloP score at each base surrounding the splice site. In the top panels are annotated splice sites, and in the bottom panels are unannotated splice sites. In blue are bases exonic of the splice site, and in black are those intronic of the splice site, as diagrammed below each panel.
Figure 4
Figure 4. Splicing error rate correlates with intron length.
We divided all introns that are bounded by highly conserved splice sites into 100 bins based on length. We then calculated, in each bin, the mean fraction of sequencing reads from either splice site to an unconserved splice site. Plotted is this mean against the formula image of the mean intron length (in base pairs) of introns in the bin. In red is a spline fit to these points.
Figure 5
Figure 5. Hexamers enriched near unconserved splice sites are relevant in exon definition.
A. Plotted is the formula image enrichment of all possible hexamers exonic of either 5′ or 3′ noise splice sites. In light blue are hexamers identified as exonic splicing enhancers by Fairbrother et al. , and in dark blue are hexamers that are good matches to the consensus U1 snSNP binding site (we include all hexamers matching five contiguous bases of “AGGTAAG”). B and C. Hexamers from A. mark borders of constitutively spliced exons. Each point is the fraction of hexamers starting at that position relative to a constitutively spliced exon (in these cells) which match the hexamers identified as significantly enriched exonic or intronic of the “noise” 5′ or 3′ splice sites.

Similar articles

Cited by

References

    1. Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003;72:291–336. - PubMed
    1. l Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–6. - PMC - PubMed
    1. Zavolan M, Kondo S, Schonbach C, Adachi J, Hume DA, et al. Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res. 2003;13:1290–300. - PMC - PubMed
    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–8. - PubMed
    1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40:1413–5. - PubMed

Publication types