. 2016 Jun 30;534(7609):719-23.

doi: 10.1038/nature18308. Epub 2016 Jun 1.

Translation readthrough mitigation

Joshua A Arribere, Elif S Cenik, Nimit Jain, Gaelen T Hess, Cameron H Lee, Michael C Bassik, Andrew Z Fire

PMID: 27281202
PMCID: PMC5054982
DOI: 10.1038/nature18308

Translation readthrough mitigation

Joshua A Arribere et al. Nature. 2016.

. 2016 Jun 30;534(7609):719-23.

doi: 10.1038/nature18308. Epub 2016 Jun 1.

Authors

Joshua A Arribere, Elif S Cenik, Nimit Jain, Gaelen T Hess, Cameron H Lee, Michael C Bassik, Andrew Z Fire

PMID: 27281202
PMCID: PMC5054982
DOI: 10.1038/nature18308

Abstract

A fraction of ribosomes engaged in translation will fail to terminate when reaching a stop codon, yielding nascent proteins inappropriately extended on their C termini. Although such extended proteins can interfere with normal cellular processes, known mechanisms of translational surveillance are insufficient to protect cells from potential dominant consequences. Here, through a combination of transgenics and CRISPR–Cas9 gene editing in Caenorhabditis elegans, we demonstrate a consistent ability of cells to block accumulation of C-terminal-extended proteins that result from failure to terminate at stop codons. Sequences encoded by the 3′ untranslated region (UTR) were sufficient to lower protein levels. Measurements of mRNA levels and translation suggested a co- or post-translational mechanism of action for these sequences in C. elegans. Similar mechanisms evidently operate in human cells, in which we observed a comparable tendency for translated human 3′ UTR sequences to reduce mature protein expression in tissue culture assays, including 3′ UTR sequences from the hypomorphic ‘Constant Spring’ haemoglobin stop codon variant. We suggest that 3′ UTRs may encode peptide sequences that destabilize the attached protein, providing mitigation of unwelcome and varied translation errors.

PubMed Disclaimer

Figures

**Extended Data 1. Distribution of C-terminal extensions upon stop codon readthrough**
Annotations and genomes were as described in Supplemental Methods. Each 3'UTR was translated starting one codon after the stop codon until the next in-frame stop codon. For metazoans, counting was done three different ways: including only genes for which exactly one 3'UTR was annotated (blue), counting each annotated 3'UTR separately (green), or counting each gene once and splitting gene counts with multiple 3'UTRs equally amongst the 3'UTR isoforms (red). “Nonstop” indicates 3’UTRs for which no stop codon was encountered prior to the poly(A) tail. For each species the distribution of next in-frame stop codons was calculated for 1,000 nucleotide shufflings of 3’UTR sequences for genes with a single 3’UTR annotated, and 95% confidence interval shown (yellow). A similar “randomized” distribution was obtained upon shuffling 3’UTR sequences and preserving dinucleotide frequency. The frequency of stops immediately after the annotated stop codon (amino acid length 0) is highlighted with a blue arrow in each species. The distribution of peptide lengths follows an exponential decay curve, where the slope is related to the probability of encountering a stop codon at each position. In the simplest model, the probability of encountering a stop codon is constant throughout the 3'UTR, accounting for the roughly linear shape of each plot (previously noted^,). Notable exceptions are a tendency towards second in-frame stops in *E. coli* (blue arrow), and a tendency towards peptides >60AAs in length in all species. In *E. coli* the enrichment towards longer downstream peptides is at least partially explained by the operonic layout of genes.

**Extended Data 2. Example quantification of the GFP/mCherry fluorescence ratios of images**
Images were taken under a broad excitation/emission filter to allow for simultaneous capture of GFP and mCherry fluorescence. Intensities of each pixel in the red and green channels were extracted in python. Unfiltered pixel intensities are shown as black dots. Pixels were filtered, background subtracted, and linear regression performed (red dots and line, see Methods). For simplicity, the green/red intensities from 1000 random pixels are shown. The GFP/mCherry fluorescence ratio was taken as the slope of the linear regression line.

**Extended Data 3. Readthrough regions confer a loss of superfolder GFP fluorescence**
Each of the indicated TerByP regions were inserted downstream of sfGFP, upstream of the *let-858* 3'UTR. TerByP is the region after the annotated stop codon, up to and including the first in-frame stop codon in the 3'UTR. Quantification was done as described (Extended Data 2).

**Extended Data 4. Explanation of “shuffle” sequences**
Trinucleotide codons from each TerByP region are color coded by gene (top). Codons were extracted and randomly shuffled in python. A codon was iteratively selected until a stop codon was encountered, defining *shuffle1*. The process was repeated twice more to define *shuffle2* and *shuffle3*. The resulting *shuffle* peptides are a combination of all three TerByP regions. Lengths and color coding of codons for *shuffle1–3* accurately reflect the sequences they are derived from.

**Extended Data 5. Translation into 3'UTRs at endogenous loci tends to yield hypomorphs**
CRISPR/Cas9 editing was used to construct the mutations shown. See Supplementary Table 1 for precise nucleotide sequences of all strains. −*1/+1 TerByP* indicate the loss or gain of one nucleotide relative to the zero frame, generating a frameshift over the stop codon, and translation into the 3'UTR out-of-frame with the coding sequence. For *unc-22*, “wild type” indicates a lack of twitching, even in 1mM levamisole. For *unc-54(Ȓ1,TerByP)*, “weak Unc” animals were visibly slower than *unc-54(+)*, but faster than *unc-54(TerByP)*. All mutant phenotypes were recessive.

**Extended Data 6. RNA-seq and Ribo-seq from *unc-54* mutants**
(a) RNA-seq and (b) ribosome footprint profiling (Ribo-seq) library mRNA counts, with summary counts (c) for the indicated strains and mRNAs. Libraries were prepared from L4 animals as described (Methods). “N2” is wild type (PD1074, VC2010). *unc-54(cc3389)* bears a TAA(Stop)>AAT(Asn) mutation, *unc-54(TerByP)*. *unc-54(e1301)* bears GGA(Gly387)>AGA(Arg387), a point mutation that confers a temperature-sensitive Unc phenotype with minimal discernable effects on UNC-54 protein levels. *unc-54(e1301)* was included as a control for the Unc phenotype of *unc-54(cc3389)*, though *e1301* confers a less severe Unc phenotype than *cc3389*. Values for *unc-54* mRNA (blue) are highlighted throughout, and for comparison, three additional transcripts known to be at least partly expressed in the body wall muscles are also highlighted: *unc-87* (pink), *unc-15* (green), and *unc-22* (red).

**Extended Data 7. Ribo-seq of *unc-54(cc3389)* shows an unexceptional progression of ribosomes in the readthrough region**
a. Raw Ribo-seq reads for *unc-54(+)* (blue) and *unc-54(cc3389)* (green) animals, plotted as read pile-ups. Mismatched bases are indicated with black bars. Location of the normal stop codon and the first in-frame stop codon are indicated with “TAA” and dotted lines. The extension in *unc-54(cc3389)* is 30 amino acids. b. Number of Ribo-seq reads in the last 30 codons, compared to the previous 30 codons, for all mRNAs. Linear regression was performed on all points (solid line), and two-fold difference shown (dashed lines). c. The distribution of Ribo-seq reads in the last 30 codons (90nts) of *unc-54(cc3389)* is shown in green, and the 95% confidence interval for all open reading frames in dashed lines. d. The fraction of in-frame Ribo-seq reads in the last 30 codons is plotted as a function of read counts in the last 30 codons, and *unc-54(cc3389)* highlighted. e. The distribution of read lengths in the last 30 codons of *unc-54(cc3389)*, and all open reading frames (95% confidence interval, dashed lines). For b–d, reads were restricted to 28,29,30 nt lengths. For b–e, a 12 nt offset was done for the ribosomal P-site, and read counts were derived solely from the *unc-54(cc3389)* Ribo-seq library. For c and e, a minimum 15 read counts was imposed to obtain the 95% CI from "all genes".

Extended Data 8. Lack of general conservation of coding potential downstream of stop codons in *Caenorhabditis*
Whole genome alignment of six nematode species with *C. elegans* genome assembly ce10/WS220 was obtained from the UCSC genome browser. For each annotated transcript, the aligned bases from the multiple species alignment were extracted and compared to the reference (*C. elegans*) genome. The left plot shows summary information of the alignment centered around annotated stop codons; the right plot shows the same centered around the first in-frame stop codon in 3'UTRs. In red is the substitution frequency, i.e. the number of mismatched bases divided by the number of aligned bases at a given position. The enrichment of “wobble” position mutations is apparent as an increase in substitutions at the third position of each codon in the CDS. In green is the synonymous substitution frequency, i.e. for codons beginning at a given position, the fraction of mutations that yield a synonmous substitution divided by all mutations at that position (synonymous+non-synonymous). The tendency to conserve amino acids in the CDS is apparent as a green spike at every in-frame codon. The change in substitution frequency and synonymous substitution frequency about the first in-frame stop codon (right plot) is due to a tendency for NTR codons to be conserved, and for AAN/AGN/GAN codons to not be conserved in 3'UTRs, regardless of frame.

**Extended Data 9. Nucleotide and amino acid composition of readthrough regions (*C. elegans*)**
Coding sequences (CDS) and 3'UTRs were analyzed for various sequence properties. For simplicity, only genes and 3'UTRs for which a single 3'UTR was annotated were considered. Similar results were obtained with genes with multiple 3'UTRs. a. Nucleotide frequency of CDS, 3'UTR, and TerByP (region between annotated stop codon and first in-frame stop codon). b. Frequency of amino acids in all three possible frames for the TerByP region. 3'UTRs were translated one codon past the stop codon of the CDS until the next in-frame stop codon, with nonstop 3'UTRs ignored. Highlighted are codons with high G content (GGN, Gly) and high T content (TTY, Phe). c. TerByP regions tend to be hydrophobic, regardless of frame. Kyte-Doolittle score was used as a measure of hydrophobicity. To reduce noise, only TerByP regions at least 10 amino acids long were considered. P-value is for Kolmogorov-Smirnov test comparing CDSs and TerByP sequences (each frame has p-value<10e-293 for this comparison). As the TerByP sequences are shorter than CDSs on average, the distribution of TerByP hydrophobicity scores will tend to have higher variance than CDSs. Random portions of CDSs were taken, length-matched to TerByP frame zero peptide lengths. This was repeated 100 times, and the 95% confidence interval is shown (dashed lines, “CDS rands”). d. Hydrophobicity of the inserts is correlated with a negative effect on GFP fluorescence. The GFP/mCherry fluorescence ratio (Fig 2B) was plotted against the maximum Kyte-Doolittle score in a six amino acid window for each insert. (Similar results were obtained using the Kyte-Doolittle score averaged across the entire sequence.) Mean (circle) and standard deviation (bars) are shown. 3'UTR-derived sequences are in blue, and non-3'UTR-derived sequences are in red. So as to avoid redundancy or skewing of the data, in cases where multiple constructs were present with the same peptide sequence (e.g. *unc-54(TerByP)*, *unc-54(TerByP,syn1)*, and *unc-54(TerByP,syn2)*), only the first of these was used. e. Hydrophobicity analysis of the TerByP extensions obtained by CRISPR/Cas9 engineering at the *unc-22* and *unc-54* loci. *+1/−1 TerByP* indicates the gain or loss of a nucleotide, generating a late frameshift and allowing translation to proceed past the annotated stop codon out-of-frame with the upstream ORF. In each case, Kyte-Doolittle hydropathy was used to analyze the C-terminal appendage. In bold is the phenotypically least affected strain of the three.

**Extended Data 10. Nucleotide and amino acid composition of readthrough regions (*H. sapiens*)**
Similar analysis of hydrophobicity as in Extended Data 9c,d performed in humans.

**Figure 1. Translation into 3’UTRs results in substantial loss of protein expression**
a. Dual fluorescence reporter assay to test expression with different 3'UTRs. Transgenic arrays of each GFP construct were created using *pha-1* selection and mCherry (pCFJ104) as a coinjection marker. Broad Filter detects GFP and mCherry signals simultaneously; deviation from yellow towards red or green shows more mCherry or GFP fluorescence, respectively. Three independent transgenic lines were made for each (two for *tbb-2(TerByP)*); transgenic lines with similar mCherry expression are shown. 200 millisecond exposure, 10× objective. b. Dual fluorescence reporter assay to test expression of readthrough for different 3'UTRs. The stop codon of each 3’UTR was mutated, allowing translation to proceed into the 3’UTR (Termination ByPass, TerByP). Images were collected as in A. “GFP (10×)” is a 2 second exposure. The dim yellowish fluorescence in “GFP (10×)” for *unc-54(TerByP)* and *tbb-2(TerByP)* is autofluorescence. c. For each gene, the 3’UTR was fused to mCherry and GFP. GFP expression was tested with the stop codon mutated to a sense codon (*TerByP*). For each of *eef-1A.1*, *rps-17*, *daf-6*, and *hlh-1*, GFP expression was also tested with the normal stop codon in place (*3'UTR*). The ratio of GFP to mCherry fluorescence under a broad fluorescence filter was used as a metric (Extended Data 2, Methods). Each triangle represents an independently-generated transgenic line; mean and standard deviation of n lines shown. Student's t-test two-tailed p-value.

**Figure 2. Identification of determinants for product loss upon translation into the 3’UTR**
a. Shortening or non-synonymous mutations of the readthrough region can restore GFP expression. Stop codons and/or mutations were inserted into each GFP::3'UTR fusion as diagrammed with stop codons (red stop sign) and poly(A) site (blue arrowhead). ^%Percent indicates same constructs shown in Fig 1. mCherry (pCFJ104) was used as a coinjection marker. “+X AA” indicates amino acids added relative to cognate control (“+0 AA”) construct. Constructs and mutated regions drawn to scale, scale bar at top. Mean and standard deviation of n lines shown. b. 3’UTR-encoded peptides are sufficient to confer GFP loss. Sequences were inserted upstream of the *let-858* 3'UTR. TerByP is the region between the canonical termination codon and first in-frame termination codon in the 3'UTR. “syn” are synonymously-substituted variants. Shuffle1–3 contain shuffled codons of *unc-54*, *tbb-2*, and *rpl-14(VLFL>RSCA)* TerByP regions (Extended Data 4). T2A is a “self-cleaving” peptide which releases the upstream nascent chain; T2A* is a non-cleaving variant^,. Rand1–3(A,C,G,T) are random combinations of A, C, G, and T created *in silico*. Each CDSN-M is an arbitrary fragment of the respective gene’s Coding DNA Sequence (from amino acid N to M).

**Figure 3. Translation into the 3’UTR at an endogenous locus acts to decrease protein levels**
a. Schematic of wild type and readthrough alleles of *unc-54*, the latter made using CRISPR/Cas9 genome editing. See Extended Data 5 for additional loci and edits. b. Brightfield images of *unc-54* alleles. Arrowhead indicates a “bag of worms”, the shell of an egg laying-defective mother consumed by its retained progeny. c. RNA-seq from *unc-54(TerByP/+)* heterozygotes showed no differential effect on RNA levels. *unc-54(TerByP/+)* heterozygotes were chosen among progeny of *unc-54(+)* males crossed with *unc-54(TerByP)* homozygotes and allele-specific reads identified. Framed inset shows individual allele-specific RNA-seq reads (bars) from *unc-54(TerByP)* (AAT, green) and *unc-54(+)* (TAA, blue). See also Extended Data 6, 7. d. Quantification of UNC-54 protein levels. Immunoblotting was performed on homozygous populations of the indicated animals. *unc-54(r293)* encodes a nonsensemediated decay allele of *unc-54*, producing <5% of normal UNC-54 protein. *unc-54(r259)* contains a >17kb deletion spanning most of the *unc-54* locus. For the lower blot, the number of animals loaded per lane is indicated. For gel source data, see Supplementary Figure 1.

**Figure 4. Translation into 3'UTRs results in protein loss for several genes in humans**
a. Lentiviral reporter schematic. A puroR-mCherry fusion was co-translationally cleaved from eGFP-insert by T2A. Constructs in B-D were integrated into K562 cells via lentiviral infection and puromycin selection. b. Validation of dual fluorescence reporter. Inserts downstream of eGFP were 3xFLAG, 3xHA, and degrons d4ODC (t_1/2~4hr), d1ODC (t_1/2~1hr). c. The sequence between the annotated and first in-frame termination codon (TerByP) from each gene was inserted downstream of eGFP (solid line). For comparison, nucleotides of each TerByP region were randomized, producing a length- and nucleotide frequency-matched construct (randTerByP, dashed line). Cells with eGFP lacking an insert and grown a week apart (top, green solid lines) and approx. fluorescence ratio of d4ODC (orange line) are shown. d. The first 30 amino acids of the *HBA2* 3'UTR were inserted downstream of eGFP (orange). Insertion of a self-cleaving T2A peptide restored expression (blue), an uncleavable mutant (T2A*) did not (light blue).

See this image and copyright information in PMC

References

1. Klauer A, van Hoof A. Degradation of mRNAs that lack a stop codon: a decade of nonstop progress. Wiley Interdiscip. Rev. RNA. 2012;3:649–660. - PMC - PubMed
1. Hamby S, Thomas N, Cooper D, Chuzhanova N. A meta-analysis of single base-pair substitutions in translational termination codons (’nonstop' mutations) that cause human inherited disease. Hum. Genomics. 2011;5:241–264. - PMC - PubMed
1. Williams I, Richardson J, Starkey A, Stansfield I. Genome-wide prediction of stop codon readthrough during translation in the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 2004;32:6605–6616. - PMC - PubMed
1. Falini B, et al. Cytoplasmic nucleophosmin in acute myelogenous leukemia with a normal karyotype. N. Engl. J. Med. 2005;352:254–266. - PubMed
1. Hollingsworth T, Gross A. The severe autosomal dominant retinitis pigmentosa rhodopsin mutant Ter349Glu mislocalizes and induces rapid rod cell death. J. Biol. Chem. 2013;288:29047–29055. - PMC - PubMed

Extended References

1. Brenner S. The genetics of Caenorhabditis elegans. Genetics. 1974;77:71–94. - PMC - PubMed
1. Okkema PG, Harrison SW, Plunger V, Aryana A, Fire A. Sequence Requirements for Myosin Gene Expression and Regulation in Caenorhabditis elegans. Genetics. 1993:385–404. - PMC - PubMed
1. Granato M, Schnabel H, Schnabel R. pha-1, a selectable marker for gene transfer in C. elegans. Nucleic Acids Res. 1994;22:1762–1763. - PMC - PubMed
1. Mello CC, Kramer JM, Stinchcomb D, Ambros V. Efficient gene transfer in C.elegans: extrachromosomal maintenance and integration of transforming sequences. EMBO J. 1991;10:3959–3970. - PMC - PubMed
1. Stinchcomb DT, Shaw JE, Carr SH, Hirsh D. Extrachromosomal DNA transformation of Caenorhabditis elegans. Mol. Cell. Biol. 1985;5:3484–3496. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- scite Smart Citations
Research Materials
- Addgene Non-profit plasmid repository

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Translation readthrough mitigation

Translation readthrough mitigation

Authors

Abstract

Figures

References

Extended References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials