Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 1;39(17):e104763.
doi: 10.15252/embj.2020104763. Epub 2020 Aug 3.

Translation of small downstream ORFs enhances translation of canonical main open reading frames

Affiliations

Translation of small downstream ORFs enhances translation of canonical main open reading frames

Qiushuang Wu et al. EMBO J. .

Abstract

In addition to canonical open reading frames (ORFs), thousands of translated small ORFs (containing less than 100 codons) have been identified in untranslated mRNA regions (UTRs) across eukaryotes. Small ORFs in 5' UTRs (upstream (u)ORFs) often repress translation of the canonical ORF within the same mRNA. However, the function of translated small ORFs in the 3' UTRs (downstream (d)ORFs) is unknown. Contrary to uORFs, we find that translation of dORFs enhances translation of their corresponding canonical ORFs. This translation stimulatory effect of dORFs depends on the number of dORFs, but not the length or peptide they encode. We propose that dORFs represent a new, strong, and universal translation regulatory mechanism in vertebrates.

Keywords: dORF; ribosome profiling; translation efficiency.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure EV1
Figure EV1. Identification of translated dORFs using ribosome profiling
  1. Cartoon for defining potential dORF in 3′ UTR based on transcriptome sequence. We searched for start–stop codon pairs within 10–100 amino acids in all 3 reading frames defined based on canonical ORF. For the start codon, the most distal ATG was first considered as a possible start codon, followed by non‐ATG (CTG, GTG, TTG) start codons in non‐overlapping regions, dark blue indicates ATG, light blue indicates non‐ATG, and gray indicates stop codon.

  2. Scatter plot for ORFscore and in‐frame proportion ribosome footprint coverage for all ORFs: annotated CDS (green), 5′ UTR uORFs (purple), 3′ UTR dORFs (red), and ORFs overlapping the annotated CDS (orange).

  3. Pie chart for distribution of all human translated dORFs based on start codon of the dORF.

  4. Metagene plots showing the distribution of the ribosome footprint and input RNA reads around the start and stop codons of canonical ORF and dORF in zebrafish high‐confidence dORF‐containing genes. The ribosome footprint reads mainly show the characteristic 3‐nucleotide periodicity across the translated ORF, while the RNA shows uniform distribution across the mRNA. Green indicates the canonical ORF; blue indicates the internal UTR (iUTR) between the stop codon of the canonical ORF and the start codon of the dORF; and red indicates the dORF. Insite shows ribosome distribution at the dORF region, close to its start and stop codon.

  5. Metagene plots of ribosome footprint reads around the start and stop codon of dORFs in human and zebrafish genes, for ATG dORFs with different confidence levels (high, middle, and low). Blue indicates iUTR; red indicates dORF.

  6. Metagene plots of ribosome footprint reads around the start and stop codon of dORFs in human and zebrafish genes, for non‐ATG dORFs with different confidence levels (high, middle, and low). Blue indicates iUTR; red indicates dORF.

Figure 1
Figure 1. Translated dORFs are prevalent in vertebrates
  1. Donut plots showing the proportion of transcripts containing high‐, medium‐, and low‐confidence dORFs based on ribosome profiling data from human and zebrafish.

  2. Metagene plots showing the distribution of the ribosome footprint and input RNA reads around the start and stop codons of canonical ORF and dORF in human high‐confidence dORF‐containing genes. The ribosome footprint reads mainly show the characteristic 3‐nucleotide periodicity across the translated ORF, while the RNA reads are uniform across the transcript. Green indicates the canonical ORF; blue indicates the internal UTR (iUTR), region between the stop codon of the canonical ORF and the start codon of the dORF; red indicates the dORF. Insite shows ribosome distribution at the dORF region, close to its start and stop codons.

  3. Boxplots showing the lengths of dORFs and iUTR for ATG start dORFs (ATG, red) and non‐ATG start dORFs (non‐ATG, yellow) with translation evidence, as well as the lengths of random (light gray) and all (dark gray) dORFs for which there was no evidence of translation. The box defines the first and third quartiles, with the median indicated with a thick black line, and vertical lines indicate the variability outside the upper and lower quartiles.

Figure EV2
Figure EV2. All groups of mRNAs, based on the dORF identification confidence level and/or translation start codon, display higher translation efficiency
  1. Boxplot for length of dORF and iUTR for ATG and non‐ATG dORF with translation evidences as well as random (light gray) and all dORF (dark gray) with no evidences of translation in zebrafish.

  2. Boxplot for RNA coverage differences in human and zebrafish high‐confidence dORFs between each of the indicating regions of the mRNA and the last 100 nt of canonical ORF for each gene. The median of both group of mRNAs, containing translated dORF (red) or none translated dORF (gray), is close to 0, indicating uniform RNA read distribution cross the RNA, and supporting the idea that dORFs are not encoded by alternative isoforms.

  3. Histogram distribution of bootstrapped orthologous genes. Genes drawn to the same size pool as dORFs with ortholog in another species from each group (862 human and 610 zebrafish). Blue lines indicate 95% confidence interval (64–98); red line indicates actual orthologous genes (123).

  4. Cumulative distribution of mRNA level and translation efficiency of genes in zebrafish. All genes are indicated in black, and controls for mRNA containing uORF (purple) or dORF (red) were resampled to share similar mRNA level (light purple for uORF controls and orange for dORF controls). P‐value indicated, Wilcoxon rank‐sum test. Only high‐confidence ATG dORFs were used in this analysis.

  5. Cumulative plot for RNA level and translation efficiency of genes containing ATG dORFs in human, grouped by confidence level of dORF translation (high, middle, and low), controls are resampled for similar mRNA level to compare the translation efficiency for each group. P‐value indicated, Wilcoxon rank‐sum test. Red indicates dORF gene, and gray indicates control gene.

  6. Cumulative plot for RNA level and translation efficiency of genes containing non‐ATG dORFs in human, grouped by confidence level of dORF translation (high, middle, and low), controls are resampled for similar mRNA level to compare the translation efficiency for each group. P‐value indicated, Wilcoxon rank‐sum test. Red indicates dORF gene, and gray indicates control gene.

Figure 2
Figure 2. The dORF‐encoded peptides are often not conserved
  1. Donut plot showing the distribution of human dORFs encoding conserved, weakly conserved, and not conserved peptides. Conservation was calculated based on 7‐way multiple alignments; dORFs were considered conserved if they had a score > 50, and weakly conserved if they had a score > 0.

  2. Venn diagram representing the orthologous genes of human and zebrafish in which translated dORFs were identified. The number of dORFs in orthologous genes expected by chance is indicated in italic.

  3. Cartoon showing RRM1 gene in human and zebrafish (orthologous), both contain translated dORF, but the dORF sequences are different, as indicated by different colors.

Figure 3
Figure 3. mRNAs containing translated dORF are efficiently translated in human cell lines and zebrafish embryos
  1. Cumulative plot of mRNA level and translation efficiency of genes in human. All genes are indicated in black. Controls for mRNA containing uORF (purple) or dORF (red) were resampled to share similar mRNA level (light purple for uORF controls and orange for dORF controls). P‐value indicated, Wilcoxon rank‐sum test. Only high‐confidence ATG dORFs were used in this analysis. Cartoon illustrates while uORFs decrease translation efficiency of the canonical ORF, dORFs increase translation efficiency of the canonical ORF. The data are human HeLa cell S‐phase ribosome profiling.

  2. Boxplot showing the length of 5′ UTR, CDS (canonical ORF), and 3′ UTR for human genes with translated dORF (red), all genes without translated dORF (dark gray), and resample controls without translated dORF for similar length of the indicated mRNA feature as well as RNA level (light gray). P‐value indicated, Wilcoxon rank‐sum test.

  3. Cumulative plot for RNA level and translation efficiency of genes containing translated dORFs in human, controls are genes without translated dORF, which are resampled for similar mRNA level, as well as similar length of either 5′ UTR (left), CDS (middle), or 3′ UTR (right) to compare translation efficiency. P‐value indicated, Wilcoxon rank‐sum test.

  4. Scatter plot showing the RNA level (left panel) and translation efficiency (right panel) median log2 fold change for mRNA containing high‐confidence dORFs (ATG + non‐ATG) or high‐confidence uORFs (ATG) compared to their respect resample control mRNA with neither uORF nor dORF for similar RNA level, across multiple studies. Different samples from the same study show the same color. References and sample conditions are indicated.

Figure EV3
Figure EV3. mRNAs with translated dORFs are efficiently translated independently of the 5′ UTR, CDS or 3′ UTR length
  1. Cumulative distribution of mRNA level and translation efficiency of mRNA with human high‐confidence dORFs with different translation start codons. Controls for each group were resampled to share similar mRNA level. P‐value indicated, Wilcoxon rank‐sum test. Light orange is mRNAs with ATG dORF, orange is mRNAs with CTG dORF, dark red is mRNAs with GTG dORF, and red is mRNAs with TTG dORF.

  2. Cumulative plot for RNA decay rate or half‐life of genes containing translated dORFs and resampled controls with similar RNA level in HeLa cells and zebrafish embryos, P‐value indicated, Wilcoxon rank‐sum test. Red indicates dORF gene, and gray indicates control gene without translated dORF.

  3. Boxplot of median poly(A) tail length for human genes with translated dORF and resampled controls with similar RNA level in HeLa cells, P‐value indicated, Wilcoxon rank‐sum test. Red indicates dORF gene, and gray indicates control gene.

  4. Boxplot showing the length of 5′ UTR, CDS (canonical ORF), and 3′ UTR for zebrafish genes with translated dORF (red), all genes without translated dORF (dark gray) and resample controls without translated dORF for similar length and RNA level (light gray). P‐value indicated, Wilcoxon rank‐sum test.

  5. Cumulative plot for RNA level and translation efficiency of genes containing translated dORFs in zebrafish, controls are genes with no translated dORF, which are resampled for similar mRNA level, as well as similar length of either 5′ UTR (left), CDS (middle), or 3′ UTR (right) to compare translation efficiency. P‐value indicated, Wilcoxon rank‐sum test.

Figure 4
Figure 4. Translation of the dORF is required for enhanced translation of the canonical ORF
  1. Scheme of paired reporters in which endogenous iUTRs and dORFs from four genes (human CCDC167 and CYR61; zebrafish rrm1 and prkcsh) were cloned downstream mCherry. One to three nucleotide mutations were introduced for each dORF reporter to generate a premature stop codon right after the translation start site (dMUT). Bar plot showing the ratio of fluorescence intensity between mCherry and GFP transfection control in each reporter with DNA transfection, dMUT expression levels were normalized to 1.

  2. Diagram of dORF reporters with endogenous iUTRs and artificial dORF sequence. A single G was inserted at beginning of dORF, highlighted in green, to cause frameshift of the dORF reading frame between dORF1 and dORF2. dORF1 and dORF2 have almost same nucleotide composition, but different amino acid sequences. Paired dMUT for each frame was introduced by point mutation to insert a premature stop codon. Bar plot showing fluorescence intensity of dORF and dMUT reporters normalized by GFP transfection control, dMUT expression levels were normalized to 1. All the reporters with iUTRs from translated dORF (CCDC167, CYR61, rrm1, and prkcsh) containing artificial dORFs show higher fluorescence intensity than their counterparts (dMUT), regardless of the reading frame or the encoded peptide with DNA transfection. The iUTRs from dORF with no translation evidences (human TROAP, PRMT5) do not show fluorescence differences between the counterparts (dMUT).

  3. In vitro transcribed mRNA of dORF and dMUT reporters were transfected into human cells. The iUTRs are from CCDC167 and rrm1 with the artificial dORF1 as indicated in Fig 4B. Bar plot showing fluorescence intensity of dORF and dMUT reporters normalized by GFP transfection control, dMUT expression levels were normalized to 1. All the reporters containing artificial dORF show higher fluorescence intensity than their counterparts (dMUT).

  4. Illustration of the dORF with alternative start codon. Besides ATG, NTG codons (CTG/GTG/TTG) are also used to replace ATG as dORF start codon (in green); paired dMUT with premature stop codon was also generated for each NTG start codon. As negative control, the codon AAG (in yellow) was used to destroy the start codon of dORF. The rrm1 iUTR and the artificial dORF1 as indicated in Fig 4B were used. Bar plot shows fluorescence intensity of each reporter normalized by dMUT reporter, all ATG and non‐ATG dORFs displayed higher fluorescence intensity than the counterpart controls (dMUT), while no fluorescence intensity difference was observed for the reporter pair with AAG codon.

Data information: For Fig 4, unpaired t‐test is used ***P < 0.005. For cytometry, two biological replicates with two technical replicates were done; the error bar shows SD.
Figure EV4
Figure EV4. Characterization of dORF and iUTR sequences
  1. Bar plot showing that the dORF and dMUT reporters have similar RNA level when endogenous iUTR and artificial dORF1 and paired mutation (dMUT1) as shown in Fig 4B were transfected with DNA.

  2. Bar plot of reporters with RPL41 gene endogenous iUTR and endogenous dORF or artificial dORF (dORF1 as shown in Fig 4B). Reporter containing endogenous dORF shows higher fluorescence intensity than its counterpart (dMUT), while reporter with artificial dORF has similar fluorescence intensity with its counterpart.

  3. Scatter plot showing the length of the 3′ UTR and number of dORFs in human, translated dORF is indicated in red, and all possible dORFs (translated and untranslated) are indicated in gray. r and P‐value indicated, Pearson correlations were calculated.

  4. Bar plot for dORF frame distributions in human and zebrafish. The frames are defined by the canonical ORF, translated dORF is indicated in red, and all possible dORFs are indicated in gray.

  5. Sequence nearby the dORF start codon presented a significant bias compared to the nucleotide composition present in the 3′ UTRs. The number shows ratio of each nucleotide in different positions. The four translation start sites (NTG) in human and zebrafish were separately analyzed. The red asterisks indicated with position with significant nucleotide bias (P < 0.05, chi‐squared test). ATG analysis panel is taken from Fig 6E (duplication of image).

Data information: For Fig EV4, unpaired t‐test is used *P < 0.05, ***P < 0.005. For transfection followed by cytometry quantification, two biological replicates with two technical replicates were done; the error bar shows SD. For transfection followed by qPCR analysis, two biological replicates with three technical replicates were done; the error bar shows SD.
Figure 5
Figure 5. The number of dORFs, but not dORF length, affects canonical ORF translation
  1. Cumulative distribution of RNA level (top panel) and translation efficiency (bottom panel) of genes containing different numbers of dORFs in human, for each group gene with different numbers of dORFs, controls are resampled for similar mRNA level to compare the translation efficiency. Number of mRNAs and P‐value indicated, Wilcoxon rank‐sum test.

  2. Scheme of reporters with different numbers of dORFs. The 3′ UTR of human E1F1 which originally contains two translated dORFs based on ribosome profiling is cloned downstream of mCherry; premature stop codon in each or both dORF was created by point mutation to change the number of dORF. Additionally, the original stop codon in the first dORF was mutated by deletion of T (indicated in gray) to generate single long dORF. Bar plot showing relative fluorescence intensity of each reporter normalized by GFP transfection control, dMUT expression level is normalized to 1. Reporters with more dORF show stronger enhancing effect for canonical gene expression. Unpaired t‐test is used ***P < 0.005. For cytometry, two biological replicates with two technical replicates were done; the error bar shows SD.

Figure 6
Figure 6. dORFs might be translated by new ribosome recruitment
  1. Scheme illustrating the dORF translation hypothesis: dORFs may be translated by new ribosome recruitment or by ribosome readthrough after canonical ORF stop codon.

  2. Scheme of bi‐cistronic reporter with a iUTR in the middle. The first ORF mCherry is driven by the cap, while the second ORF GFP might be driven by the iUTR. A 42‐nt stem–loop is inserted at the 5′ UTR before mCherry (SL 5′) or between mCherry and iUTR in the 3′ UTR (SL 3′) to inhibit translation.

  3. Bar plots showing fluorescence intensity of mCherry and GFP in bi‐cistronic reporter with CYR61 and CCDC167 iUTR. Insertion of stem–loop in 5′ UTR (SL 5′) decreases mCherry fluorescence, while GFP is not affected. Insertion of stem–loop after the stop codon of mCherry (SL 3′) does not decrease the expression of mCherry or GFP. For cytometry, two biological replicates with two technical replicates were done; the error bar shows SD.

  4. Northern blots of the bi‐cistronic reporters showing no alternative splicing or transcription isoforms. Biotinylated DNA oligos anti‐GFP and mCherry were used as probes.

  5. Sequence nearby the dORF start codon (ATG) presented a significant bias compared to the nucleotide composition present in human 3′ UTR. The number shows ratio of each nucleotide in different positions. The red asterisks indicated with position with significant nucleotide bias (P < 0.05, chi‐squared test).

  6. Volcano plot showing enrichment/depletion of the 4‐mer (three nucleotides upstream and the first downstream of the dORF start codon) between translated and untranslated dORFs for human ATG dORFs (log2, fold change, y‐axis), and P‐value (y‐axis), binomial test.

  7. PCA for human and zebrafish dORF with different start codons based on the different 4‐mer enrichment. Similar analysis was done for the canonical ORF (referred to as Kozak in the figure).

Figure EV5
Figure EV5. Sequence bias nearby the dORF translation start codon
  1. Volcano plot for the fold change and P‐value of 4‐mer (three nucleotides upstream and the first downstream of the dORF start codon) between translated and untranslated dORFs for human dORFs for each start codon, binomial test. Dots with annotated sequences are significantly biased in translated dORF; red dashed line is P‐value 0.05.

  2. Volcano plot showing enrichment/depletion of the 4‐mer (three nucleotides upstream and the first downstream of the dORF start codon) between translated and untranslated dORFs (log2 fold change, x‐axis), and P‐value (y‐axis), for zebrafish dORF for each of the translation start site codon (NTG), binomial test. Dots with annotated sequences are significantly biased in translated dORF; red dashed line is P‐value 0.05.

  3. PCA for human and zebrafish dORF with different start codons based on the nucleotide bias of sequence nearby dORF start codon; the Kozak sequence indicates sequence nearby the canonical ORF translation start codon.

Figure 7
Figure 7. Model for dORF mechanism
iUTR might recruit translation factors and/or ribosomes for dORF translation. Based on the closed loop of mRNA due to 5′–3′ interaction (UTRs crosstalk), the iUTR and dORF at 3′ UTR might be physically closed to the canonical ORF start. Thus, the factors/ribosomes recruited by iUTR for dORF might enhance translation of the canonical ORF, and therefore, higher number of translated dORFs would enhance the regulation strength.

Comment in

References

    1. Anderson DM, Anderson KM, Chang CL, Makarewich CA, Nelson BR, McAnally JR, Kasaragod P, Shelton JM, Liou J, Bassel‐Duby R et al (2015) A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell 160: 595–606 - PMC - PubMed
    1. von Arnim AG, Jia Q, Vaughn JN (2014) Regulation of plant translation by upstream open reading frames. Plant Sci 214: 1–12 - PubMed
    1. Arribere JA, Gilbert WV (2013) Roles for transcript leaders in translation and mRNA decay revealed by transcript leader sequencing. Genome Res 23: 977–987 - PMC - PubMed
    1. Barbosa C, Peixeiro I, Romao L (2013) Gene expression regulation by upstream open reading frames and human disease. PLoS Genet 9: e1003529 - PMC - PubMed
    1. Bazzini AA, Lee MT, Giraldez AJ (2012) Ribosome profiling shows that miR‐430 reduces translation before causing mRNA decay in zebrafish. Science (New York, NY) 336: 233–237 - PMC - PubMed

Publication types