Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec;21(12):2096-113.
doi: 10.1101/gr.119974.110. Epub 2011 Oct 12.

Evidence of abundant stop codon readthrough in Drosophila and other metazoa

Affiliations

Evidence of abundant stop codon readthrough in Drosophila and other metazoa

Irwin Jungreis et al. Genome Res. 2011 Dec.

Abstract

While translational stop codon readthrough is often used by viral genomes, it has been observed for only a handful of eukaryotic genes. We previously used comparative genomics evidence to recognize protein-coding regions in 12 species of Drosophila and showed that for 149 genes, the open reading frame following the stop codon has a protein-coding conservation signature, hinting that stop codon readthrough might be common in Drosophila. We return to this observation armed with deep RNA sequence data from the modENCODE project, an improved higher-resolution comparative genomics metric for detecting protein-coding regions, comparative sequence information from additional species, and directed experimental evidence. We report an expanded set of 283 readthrough candidates, including 16 double-readthrough candidates; these were manually curated to rule out alternatives such as A-to-I editing, alternative splicing, dicistronic translation, and selenocysteine incorporation. We report experimental evidence of translation using GFP tagging and mass spectrometry for several readthrough regions. We find that the set of readthrough candidates differs from other genes in length, composition, conservation, stop codon context, and in some cases, conserved stem-loops, providing clues about readthrough regulation and potential mechanisms. Lastly, we expand our studies beyond Drosophila and find evidence of abundant readthrough in several other insect species and one crustacean, and several readthrough candidates in nematode and human, suggesting that functionally important translational stop codon readthrough is significantly more prevalent in Metazoa than previously recognized.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Protein-coding evolutionary signatures for typical, readthrough, and double-readthrough stop codons. Alignments surrounding the annotated stop codons of three genes for 12 Drosophila species and their inferred maximum-parsimony common ancestor. The color coding of substitutions and insertions/deletions (indels) relative to the common ancestor is a simplification for visualization purposes, as the actual PhyloCSF score sums over all possible ancestral sequences and weighs every codon substitution by its probability. Insertions in other species relative to D. melanogaster are not shown. (A) Alignment of a typical gene (bw), shows abundant synonymous and conservative substitutions (green) upstream of the stop codon, and many non-conservative substitutions (red), frameshifting indels (orange), and in-frame stop codons downstream from the stop codon. The stop codon locus shows several substitutions between different stop codons. (B) Alignment of CG17319, one of 283 readthrough candidates. The region between the annotated stop codon and the next in-frame stop codon shows mostly synonymous substitutions and lacks frameshifting indels, while the region downstream from the second stop shows non-conservative substitutions and indels typical of non-coding regions, providing evidence of continued protein-coding selection in the region between the two stop codons, and suggesting likely translational readthrough of the first stop codon. As is typical for readthrough candidates, the first stop codon is perfectly conserved, while the second stop codon shows substitutions between different stop codons. (C) Alignment of a double-readthrough candidate, Glu-RIB (one of 16 cases). Both the second ORF and the third ORF show protein-coding signatures, indicating that both stop codons are likely readthrough. Both readthrough stop codon positions show no substitutions.
Figure 2.
Figure 2.
Manual curation distinguishes 283 readthrough candidates. Steps of filtering method used to eliminate transcripts with other plausible explanations for the observed second-ORF protein-coding selection, leading to the final list of 283 unambiguous readthrough candidates.
Figure 3.
Figure 3.
Experimental validation of readthrough. (A) GFP insert construct replacing the second stop codon so that GFP is only observed after translation of the 3′ end of the second ORF and subsequent eGFP gene. GOI_F and GOI_R are 50-bp homology arms on the forward and reverse strands specific to each gene of interest (GOI). (B) Expression of GFP in transgenic constructs showing that translation continues through to the second stop codon for four of the readthrough (RT) candidates. Strains shown are z-RT, Sp1-RT, and cnc-RT in embryos, and Abd-B-RT in the central nervous system of a larva. No GFP expression was found in a wild-type strain used as a control (Supplemental Fig. S12). (C) Mass spectrometry evidence of readthrough. Example of readthrough region (gish) supported by a 22-amino-acid peptide match (red rectangle) to mass spectrometry Drosophila PeptideAtlas (one of nine cases). With no ATG codon between the stop codon and the peptide, and no observed alternative splicing events across thousands of RNA-seq reads overlapping this region, readthrough seems the only plausible explanation for translation of this peptide.
Figure 4.
Figure 4.
Single-species evidence of readthrough region translation. (A) D. melanogaster sequence composition of readthrough regions as measured by the Z curve statistic (x-axis) suggests they are protein-coding (positive scores). (Top panel) Coding regions before the first stop for both readthrough candidates (crosses) and non-readthrough transcripts (squares) show positive Z curve scores typical of protein-coding regions. (Middle panel) Non-coding regions after the second stop for readthrough candidates (crosses) and after the first stop for typical transcripts (squares) show negative Z curve scores typical of non-coding regions. (Bottom panel) Readthrough regions show positives scores typical of protein-coding regions, providing single-species evidence that most readthrough regions are protein-coding. Evaluated regions in all panels were selected to match the length distribution of readthrough regions. (B) Single nucleotide polymorphisms (SNPs) show a strong bias to result in synonymous codon substitutions in readthrough regions (top right) and coding regions (top left), but no bias is seen in second ORFs downstream from non-readthrough stop codons (top middle), providing evidence that readthrough regions are under protein-coding selection within the D. melanogaster population. For each type of region we show the fraction of SNPs that would be synonymous if translated in each of three frames, with frame 0 matching the translated frame of the coding region of the gene. Error bars show the Standard Error of the Mean (SEM). As most third codon positions result in synonymous substitutions, the exclusion of non-synonymous substitutions is also visible as a periodicity in the fraction of readthrough candidates that have an SNP at each position of the second ORF (bottom panel), with third-codon-position SNPs (red) more prevalent than first or second-codon position SNPs (blue). This plot also shows an overall decrease in the number of SNPs near the readthrough stop codon, likely due to additional signals involved in regulating readthrough, such as RNA structures, encoded within the protein-coding signal. (C) Periodic base-pairing frequency in readthrough regions (red) matches that of known coding regions (blue) but is different from that of UTRs (green). Fraction of transcripts for which a given nucleotide is paired in predicted RNA secondary structures (y-axis) at each position relative to a stop codon (x-axis). Third codon positions (purple) are paired more frequently than first or second positions, and stop codons (positions −3, −2, and −1) show decreased pairing, as previously observed computationally in humans and experimentally in yeast (top panel). Transition from periodic to non-periodic pairing happens at the second stop codon for readthrough candidates (bottom panel). Signal is averaged over five codon positions (see Methods), with raw data shown in Supplemental Figure S2. Error bars show the Standard Error of the Mean (SEM).
Figure 5.
Figure 5.
Evidence of readthrough mechanism. (A,B) Excess of high-scoring regions in-frame (frame 0) compared to out-of-frame (frame 1, frame 2) suggests readthrough as the likely mechanism and provides an estimate of readthrough count. (A) PhyloCSF score per codon (x-axis) of the regions starting 0, 1, or 2 bases after all D. melanogaster annotated stop codons (red, green, purple, respectively) and continuing until the next stop codon in that frame, excluding regions that overlap another annotated transcript. Frame 0 shows an excess of more than 400 predicted protein-coding regions compared with the other reading frames, suggesting abundant readthrough. In contrast, a similar plot for Caenorhabditis elegans shows no significant excess in frame 0 (Supplemental Fig. S11), suggesting that the abundance of readthrough in Drosophila is not universal. (B) Possible mechanisms associated with protein-coding function downstream from D. melanogaster stop codons (rows) and associated reading frame offsets where corresponding protein-coding function is expected (columns). Random fluctuations would lead to an even distribution among the three frames, as would unannotated alternative splice variants and unannotated IRESs (note that annotated splice variants and IRESs have already been excluded), while frameshift events and recent frameshifting indels would bias away from frame 0. A bias for in-frame protein-coding selection is expected only for stop codon readthrough, recent nonsense mutations, A-to-I editing, and selenocysteine, the latter three together accounting for at most 17 cases. This leaves readthrough as the only plausible explanation for an excess of ∼420 frame 0 regions with positive PhyloCSF scores. (C) Usage of stop codon context (stop codon and subsequent base) provides additional evidence of a readthrough mechanism. The 4-base contexts are sorted in order of decreasing frequency among the 14,928 non-readthrough stop codons (blue), with less frequent stop codons (top, e.g., TGA-C) experimentally associated with translational leakage in other species and most frequently associated with efficient termination (bottom, e.g., TAA-A). Context frequencies for readthrough candidates (red) are opposite of non-readthrough transcripts, suggesting a preference for leaky context, with one-third using TGA-C and almost none using TAA-A. (D) Increased stop codon conservation in readthrough candidates. Only ∼1/3 of D. melanogaster non-readthrough stop codons have aligned stops in all 12 species, and only ∼1/3 of those are perfectly conserved (i.e., have the same stop codon in all 12 species). In contrast, 83% of candidate readthrough stop codons have an aligned stop in all 12 species, and 97% of those are perfectly conserved. While all three stop codons are involved in readthrough of different genes, individual readthrough genes rarely show substitutions between different stop codons, suggesting that the three stop codons are not functionally equivalent. Moreover, the only eight substitutions observed are between TAA and TAG, with no substitutions involving TGA, even though it is the most frequent readthrough stop codon, suggesting that TAA and TAG are functionally similar.
Figure 6.
Figure 6.
RNA structures associated with readthrough genes. (A) Fly, human, and worm examples of conserved, stable RNA structures predicted in the 100-nt regions downstream from (and including) candidate readthrough stop codons. The stop codon is highlighted in red. Twenty-nine structures were found in D. melanogaster, one in human, and one in C. elegans. The stem–loop in hdc was previously found to trigger readthrough. (B) Across 283 Drosophila readthrough candidates (red bars), 10% (n = 29) showed predicted structures in the 100-nt region downstream from the first stop codon compared with only 1% for non-readthrough transcripts (blue bars). The enrichment is exclusively found downstream from the first stop codon, with only one readthrough candidate showing a predicted structure in the 100 nt upstream of the first stop codon and three in the 100-nt downstream from the second stop codon, suggesting potential interactions with the ribosome during reading of the readthrough stop position. (C) Readthrough stop codon usage among readthrough candidates with and without predicted structures and non-readthrough genes. Although most readthrough candidates use TGA, readthrough candidates with structures show a preference for TAG, suggesting that a leaky stop codon context might not be necessary for readthrough in the presence of RNA structures.
Figure 7.
Figure 7.
Examples of readthrough candidates in other species. (A) Alignment across 29 mammals for readthrough region in human gene SACM1L, one of four mammalian candidates. (B) Alignment across five worm species for the readthrough region in C. elegans gene C18B2.6, one of five nematode candidates. The stop codon context in all five is TGA-C and is perfectly conserved among Caenorhabditis species. (C) Alignment across 12 Drosophila and three other insect species, Anopheles gambiae (mosquito), Apis mellifera (honey bee), and Tribolium castaneum (red flour beetle), for the readthrough region of the D. melanogaster slo gene, one of 17 readthrough candidates conserved in mosquitoes, and one of four conserved across all 15 aligned insects. Although PhyloCSF cannot tell us whether the region is protein-coding in a particular subset of species, the large number of synonymous substitutions specifically in the other three insects, lack of non-synonymous substitutions and frameshifting indels, and perfectly conserved “leaky” TGA-C stop codon context suggest that readthrough also occurs in these other insects.
Figure 8.
Figure 8.
Estimated abundance of readthrough in insects and other eukaryotic species using single-species evidence. Estimated number of readthrough transcripts in 25 species, calculated using single-species sequence-composition evidence quantified by Z curve scores for downstream ORFs in three frames to detect excess of positive scores in frame 0 associated with abundant readthrough (RT). (A) Distribution of Z curve scores in three frames providing a single-species estimate for D. melanogaster consistent with our PhyloCSF-based estimate (Fig. 5A). Even though the Z curve does not provide sufficient power to detect individual readthrough genes, the excess of 259 positive Z curve scores for frame 0 nonetheless provides a robust single-species estimate of the overall abundance of readthrough in D. melanogaster. Because the histogram excludes second ORFs shorter than 10 codons long and uses a conservative threshold for detecting coding regions, this number should be interpreted as a lower bound. (B) Estimated number of readthrough transcripts with 90% confidence intervals for 25 species. Estimated number of readthrough transcripts is dozens or more for each of the insects tested, and for three insects and one crustacean, even the low end of the confidence interval is more than 100 transcripts, whereas none of the other species tested has more than 100 readthrough transcripts even at the high end of the confidence interval, suggesting that this level of abundant readthrough is specific to insects and crustacea. (C) Contribution of several potential mechanisms to the number of positive-scoring frame 0 transcripts for humans and five species with abundant readthrough. Horizontal bars show the number of positive scores in each of the three frames, with the frame 0 bar divided into estimates of the number of transcripts resulting from each of four potential mechanisms: positive scores that could occur in any frame, such as chance or splicing, estimated using the counts for the other two frames (blue); recent nonsense mutations, estimated using comparative information from D. melanogaster (red); sequencing mismatches, estimated using a homology test and simulated sequencing errors (green); and readthrough, obtained by subtracting the others from the total (purple). The error bar shows the 90% confidence interval for the number of readthrough transcripts, measured from the start of the readthrough portion of the bar, with the expected number of readthrough transcripts and lower end of the confidence interval reported in the title.

References

    1. Amrani N, Ganesan R, Kervestin S, Mangus DA, Ghosh S, Jacobson A 2004. A faux 3′-UTR promotes aberrant termination and triggers nonsense-mediated mRNA decay. Nature 432: 112–118 - PubMed
    1. Aphasizhev R 2007. RNA editing. Mol Biol 41: 227–239
    1. Beier H, Grimm M 2001. Misreading of termination codons in eukaryotes by natural nonsense suppressor tRNAs. Nucleic Acids Res 29: 4767–4782 - PMC - PubMed
    1. Bekaert M, Firth AE, Zhang Y, Gladyshev VN, Atkins JF, Baranov PV 2010. Recode-2: New design, new search tools, and many more genes. Nucleic Acids Res 38: D69–D74 - PMC - PubMed
    1. Bergstrom DE, Merli CA, Cygan JA, Shelby R, Blackman RK 1995. Regulatory autonomy and molecular characterization of the Drosophila out at first gene. Genetics 139: 1331–1346 - PMC - PubMed

Publication types

LinkOut - more resources