Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec;33(12):3108-3132.
doi: 10.1093/molbev/msw189. Epub 2016 Sep 7.

Evolutionary Dynamics of Abundant Stop Codon Readthrough

Affiliations

Evolutionary Dynamics of Abundant Stop Codon Readthrough

Irwin Jungreis et al. Mol Biol Evol. 2016 Dec.

Abstract

Translational stop codon readthrough emerged as a major regulatory mechanism affecting hundreds of genes in animal genomes, based on recent comparative genomics and ribosomal profiling evidence, but its evolutionary properties remain unknown. Here, we leverage comparative genomic evidence across 21 Anopheles mosquitoes to systematically annotate readthrough genes in the malaria vector Anopheles gambiae, and to provide the first study of abundant readthrough evolution, by comparison with 20 Drosophila species. Using improved comparative genomics methods for detecting readthrough, we identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and of 51 additional Drosophila melanogaster stop codons, including several cases of double and triple readthrough and of readthrough of two adjacent stop codons. We find that most differences between the readthrough repertoires of the two species arose from readthrough gain or loss in existing genes, rather than birth of new genes or gene death; that readthrough-associated RNA structures are sometimes gained or lost while readthrough persists; that readthrough is more likely to be lost at TAA and TAG stop codons; and that readthrough is under continued purifying evolutionary selection in mosquito, based on population genetic evidence. We also determine readthrough-associated gene properties that predate readthrough, and identify differences in the characteristic properties of readthrough genes between clades. We estimate more than 600 functional readthrough stop codons in mosquito and 900 in fruit fly, provide evidence of readthrough control of peroxisomal targeting, and refine the phylogenetic extent of abundant readthrough as following divergence from centipede.

Keywords: Anopheles; Drosophila; recoding.; stop codon readthrough; termination codon suppression; translational readthrough.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Protein-coding evolutionary signatures for non-readthrough, readthrough, triple readthrough, and double-stop readthrough stop codons. Alignments surrounding the annotated stop codons of four genes for 21 Anopheles species, displayed by CodAlignView (Jungreis et al. 2016). The color coding of substitutions and insertions/deletions (indels) relative to A. gambiae is a simplification for visualization purposes, as the actual PhyloCSF score sums over all possible ancestral sequences and weighs every codon substitution by its probability. Insertions in other species relative to A. gambiae are not shown. (A) Alignment of a typical gene (AGAP011673-RA), shows abundant synonymous and conservative substitutions (green) upstream (to the left) of the stop codon, and many radical substitutions (red), frameshifting indels (orange), and poorly conserved in-frame stop codons downstream of the annotated stop codon. The stop codon locus shows a substitution between different stop codons. (B) Alignment of AGAP000058-RA, one of 353 A. gambiae readthrough candidates. The region between the annotated stop codon and the next in-frame stop codon shows mostly synonymous substitutions and lacks frameshifting indels, whereas the region downstream from the second stop shows radical substitutions and indels typical of non-coding regions, providing evidence of continued protein-coding selection in the region between the two stop codons, and suggesting likely translational readthrough of the first stop codon. As is typical for readthrough candidates, the first stop codon is perfectly conserved, whereas the second stop codon shows substitutions between different stop codons. (C) Alignment of triple-readthrough candidate AGAP006474-RA (one of 35 double-readthrough candidates in A. gambiae including five triple-readthrough candidates). (D) Alignment of double-stop readthrough candidate AGAP009063-RA (one of 13 cases). The ORF after two adjacent stop codons shows a protein-coding signature, indicating that the ribosome likely reads through both stop codons. To the best of our knowledge, no cases of readthrough of two adjacent stop codons have previously been observed or predicted.
Fig. 2
Fig. 2
New techniques identify 353 A. gambiae and 51 additional D. melanogaster readthrough candidates. (A) Steps used to generate list of readthrough candidates in A. gambiae. Starting with 220 second ORFs having high PhyloCSF-ΨEmp score, we eliminated cases with a more plausible explanation of the protein-coding signature to yield 187 preliminary readthrough candidates. We used these to train PhyloCSF + Stop, and used that, orthology to D. melanogaster, and other evidence to find 166 additional readthrough candidates. (B) PhyloCSF-ΨEmp is an improved method for distinguishing protein-coding regions when extremely high specificity is required. Cross-validated cost curve (Drummond and Holte 2000) shows, for each prior probability that the input region is coding, the probability that the discriminator makes an error at the optimal score threshold for that prior. The performance of PhyloCSF-Ψ and of PhyloCSF-ΨEmp are similar for most values of the prior, but when the prior probability of coding is extremely low, PhyloCSF-ΨEmp makes noticeably fewer errors, for example, 7% fewer errors when the prior probability is 2%. (C) Figure shows the fraction of preliminary readthrough candidate first stop codons and other stop codons for which all aligned stop codons are TAA, TAG, TGA, or a mix. For most preliminary readthrough candidates, the first stop codon is perfectly conserved, usually TGA, whereas the majority of other annotated stop codons are not. We used this to define PhyloCSF + Stop of a second ORF by determining to which of these four categories its first stop codon belongs, and combining that evidence with its PhyloCSF-ΨEmp score. (D) For our comparative analyses, we used 333 D. melanogaster readthrough candidates consisting of 282 that had been reported in our earlier paper and 51 newly reported readthrough candidates found by homology to our A. gambiae candidates or the other D. melanogaster candidates.
F<sc>ig</sc>. 3
Fig. 3
Mosquito-fly comparison provides insights into readthrough evolutionary dynamics. (A) Phylogenetic tree of 12 Drosophila and 19 Anopheles species. (B) Boxes quantify stop codons in each category used in our cross-clade comparisons. (C) Boxes classify and quantify the common and distinct portions of the readthrough gene repertoires of A. gambiae and D. melanogaster, to determine which differences are associated with gene birth and death (“coterminous”). Bottom group shows differences that might be due to coterminous events, whereas next higher group shows differences that cannot be. In other cases we do not know if the repertoires are different but if they are it is not due to coterminous events. At most 34% of the differences are due to coterminous events. (D) Among readthrough-readthrough pairs, nine have predicted RNA structures in A. gambiae and nine do in D. melanogaster, but only four have structures in both, implying that some structures are ancient whereas others have been gained or lost while readthrough persisted. None of the non-readthrough transcripts orthologous to readthrough candidates have structures, suggesting that the structures were not present for very long before readthrough appeared. (E) Upper figure shows first ORF length of each readthrough candidate orthologous to a non-readthrough transcript versus the first ORF length of the ortholog. Lower figure shows first ORF lengths of readthrough candidates orthologous to non-readthrough transcripts, corresponding lengths of the paired non-readthrough transcripts, and lengths of all non-readthrough transcripts in genes that have orthologs in the other species. There is almost no difference between the first ORF lengths of the readthrough candidates and their non-readthrough orthologs, but they are generally larger than the other non-readthrough transcripts, implying that longer genes are more likely to become readthrough rather than that genes tend to get longer after becoming readthrough. (F) The first stop codon is TGA and 3′ base is C in a larger fraction of ancient readthrough candidates than readthrough candidates in our comparison group. Error bars show standard error of mean. (G) Stop codon usage in ancient readthrough pairs. The dearth of pairs having a TGA stop codon in one species and not the other (only 4) implies that the increased prevalence of TGA among ancient readthrough candidates is due to loss of readthrough among TAA and TAG stop codons, rather than conversion of TAA or TAG to TGA. (H) Fraction of readthrough candidates containing most-enriched 8-mer. Error bars show standard error of mean. The 8-mer is highly enriched among readthrough candidates in each species, but significantly more so in D. melanogaster, with the difference concentrated among the readthrough candidates in the comparison group, implying the difference is due to an increased prevalence of the 8-mer in genes that have become readthrough in Drosophila since the lineages diverged. (I) The number of matches when aligning the ten amino acids after the first stop codon with the corresponding region of the orthologous transcript for readthrough-readthrough orthologous pairs is significantly fewer than the number of matches before the stop codon for these pairs or for orthologous pairs of control transcripts, implying that readthrough regions have been under less purifying selection at the amino acid level than other coding regions. (J) Ancient readthrough regions have higher PhyloCSF scores than ones in the comparison group, suggesting that older readthrough regions are under greater purifying selection at the amino acid level.
Fig. 4
Fig. 4
Estimating the number of readthrough stop codons. (A) Distribution of PhyloCSF-ΨEmp scores of all regions starting 0, 1, and 2 bases after an annotated A. gambiae stop codon (black, red, green, respectively) and continuing until the next stop codon in that frame, excluding ones that overlap an annotated coding region in any frame or whose alignment has inadequate branch length. Since readthrough second ORFs would have elevated score only in frame 0, whereas regions with high score due to other causes would be distributed among all three frames, the excess of high scoring regions in frame 0 allows us to estimate the number of readthrough stop codons, including ones that we cannot distinguish individually. (B) Graph showing, for each PhyloCSF-ΨEmp score threshold, t, the estimated number of readthrough regions having a score higher than t, in A. gambiae (orange) and D. melanogaster (green), with 95% confidence intervals (dotted curves), and the number of A. gambiae readthrough candidates whose readthrough regions have score higher than t (black curve). Also, 95% confidence lower bound for the total number of functional readthrough stop codons in A. gambiae (red dashed line) and D. melanogaster (blue dashed line). The estimated number of readthrough regions having a score greater than 0 is 406 in A. gambiae and 754 in D. melanogaster, and the difference is unlikely to be due to differential annotation quality. The total numbers of functional readthrough regions of all scores are, with 95% confidence, at least 614 in A. gambiae and 960 in D. melanogaster, which are much larger than the numbers of candidates reported individually. In A. gambiae, the number of readthrough candidates is close to the estimated number of readthrough stop codons for PhyloCSF-ΨEmp > 5.0, indicating that our candidate list includes almost all high-scoring readthrough regions.
F<sc>ig</sc>. 5
Fig. 5
Estimated abundance of readthrough in 52 eukaryotic species. Estimate is calculated using single-species sequence-composition evidence quantified by Z curve scores for downstream ORFs in three frames to detect excess of positive scores in frame 0 associated with abundant readthrough. For each species, gray bar shows the maximum likelihood estimate of the number of functional readthrough transcripts among the subset of transcripts whose second ORFs are at least 10 codons long and have positive Z curve score, which probably includes fewer than one quarter of all functional readthrough transcripts, whereas black bar shows a 95% confidence lower bound. Tree shows phylogenetic relationships, with red branches indicating abundant readthrough, defined by maximum likelihood estimate greater than 50, which roughly corresponds to a 95% confidence lower bound greater than 0. Readthrough is abundant in all of the Anopheles and Drosophila species, most of the other insect species tested, and the crustacean, D. pulex, whereas none of the non-Pancrustacea species appear to have abundant readthrough, suggesting that it evolved in the Pancrustacea after they split from Myriapoda.

References

    1. Andreev DE, O’Connor PBF, Zhdanov AV, Dmitriev RI, Shatsky IN, Papkovsky DB, Baranov PV. 2015. Oxygen and glucose deprivation induces widespread alterations in mRNA translation within 20 minutes. Genome Biol. 16:90.. - PMC - PubMed
    1. Arensburger P, Megy K, Waterhouse RM, Abrudan J, Amedeo P, Antelo B, Bartholomay L, Bidwell S, Caler E, Camara F, et al. 2010. Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomics. Science 330:86–88. - PMC - PubMed
    1. Artieri CG, Fraser HB. 2014. Evolution at two levels of gene expression in yeast. Genome Res. 24:411–421. - PMC - PubMed
    1. Baudin-Baillieu A, Legendre R, Kuchly C, Hatin I, Demais S, Mestdagh C, Gautheret D, Namy O. 2014. Genome-wide translational changes induced by the prion [PSI+]. Cell Rep. 8:439–448. - PubMed
    1. Beznosková P, Wagner S, Jansen ME, von der Haar T, Valášek LS. 2015. Translation initiation factor eIF3 promotes programmed stop codon readthrough. Nucleic Acids Res. 43:5099–5111. - PMC - PubMed

Publication types

LinkOut - more resources