Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 25;46(2):985-994.
doi: 10.1093/nar/gkx1114.

Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons

Affiliations

Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons

Alexander J Diaz de Arce et al. Nucleic Acids Res. .

Abstract

The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
High-throughput analysis of TIS motifs utilizing non-AUG start codons. A) The TIS reporter used to measure translation initiation efficiency from every AUG and non-AUG start codon. The -4 to -1 and +4 positions were varied to create a library of all possible sequences at those positions (N = A, C, G, or U). RFP was expressed from the same transcript using an internal ribosome entry site (IRES) and served to normalize GFP expression. F2A is a peptide that allows multiple proteins to be expressed from a single open reading frame. PuroR is the puromycin resistance gene, which enabled selection of stably transduced cells. B) Summary of the FACS-seq method. A population of stably transduced cells is sorted into 20 equally populated gates based on TIS efficiency (GFP/RFP). The TIS sequences are then PCR-amplified and barcoded before being pooled and sequenced. FACS-seq histograms were then created for each TIS sequence based on the number of reads for each TIS in each gated population. The median efficiency values for each TIS sequence were then fit with a generalized linear model that accounted for important dinucleotide interactions. C) Heat map of the TIS efficiencies measured via FACS-seq. The labels of the nucleotides at the -2 and -1 positions follow the same pattern as the other positions: U, C, A, G. A TIS efficiency of 100 corresponds to the TIS sequence CACCAUGG.
Figure 2.
Figure 2.
A subset of non-AUG codons can be as efficient as AUG start codons. A) Box and whisker plot of TIS efficiencies for each start codon. The edges of the box represent the inner quartile range, the bar within the box represents the median, and the bars extending out of the rectangle extend to the maximum and minimum of the range of efficiencies for each codon. The shaded area represents the range of TIS efficiencies where AUG and non-AUG codons overlap. B) TIS efficiencies determined from individually deployed TIS reporter constructs via flow cytometry. Start codons are underlined. Error bars represent the standard deviation between experimental replicates (N = 3).
Figure 3.
Figure 3.
Non-AUG start codons are more dependent on their surrounding nucleotide context than AUG start codons. The average TIS efficiency of all TIS sequences with each codon and nucleotide in a specified position. A) -4 position. B) -3 position. C) -2 position. D) -1 position. E) +4 position.
Figure 4.
Figure 4.
Efficiency of TISs utilizing non-AUG start codons varies linearly between proteins. The efficiencies of TIS sequences were measured individually by flow cytometry with several genetic reporters and were then compared to the efficiency predicted by FACS-seq. A) GFP reporter (R2 = 0.79). B) BFP reporter (R2 = 0.83). C) Myc-GFP reporter (the N-terminal of c-Myc fused to GFP, R2 = 0.97). Reporter constructs were the same as in Figure 1A with GFP substituted with the other reporter genes. All measured efficiencies were normalized to RFP expression. The TIS sequence CACCAUGG has a TIS efficiency of 100. Error bars represent the standard deviation between experimental replicates (N≥3).
Figure 5.
Figure 5.
Ribosomal profiling confirms a predictive role for TIS efficiency in start codon utilization in the genome. The frequency of start codon utilization, the percentage of potential TIS motifs that were observed to act as translation initiation sites, in the 5′ leader sequences of three independent ribosomal profiling experiments. Each study used a unique small molecule drug to stall initiating ribosomes. A) Lactimidomycin - Lee et al, 2011, B) Harringtonine - Ingolia et al, 2011, C) Puromycin - Fritsch et al, 2012. P-values were calculated using Fisher's exact test. * denotes a P-value < 0.05; ** denotes a P-value < 0.001.
Figure 6.
Figure 6.
Single-base mutations in the TIS can alter protein expression of c-Myc. A) The wild type (WT) c-Myc transcript of Homo sapiens (NM_002467, not to scale). The CUG and AUG codons are the two native TISs and are separated by 45 nucleotides. B) The c-Myc-GFP reporter construct used to measure expression. The downstream AUG start codon was mutated to AGG, disabling expression of the truncated isoform. The 5′ leader sequence and the first 144 nucleotides of c-Myc were inserted into GFP-IRES-RFP reporter (Figure 1A). C, D) Expression of c-Myc-GFP measured by flow cytometry relative to expression from the WT TIS sequence with either CUG (C) or AUG (D) as the start codon. The observed relative expression is compared with the results predicted from the FACS-seq data presented in Figure 1C. Start codons are underlined and the bases that differ from the WT sequence are in bold. † following the TIS sequence denotes mutations that were previously documented in tumor samples. Error bars represent the standard deviation between experimental replicates (N = 3).

Similar articles

Cited by

References

    1. Tikole S., Sankararamakrishnan R.. A survey of mRNA sequences with a non-AUG start codon in RefSeq database. J. Biomol. Struct. Dyn. 2006; 24:33–41. - PubMed
    1. Peabody D.S. Translation initiation at non-AUG triplets in mammalian cells. J. Biol. Chem. 1989; 264:5031–5035. - PubMed
    1. Kozak M. Context effects and inefficient initiation at non-AUG codons in eucaryotic cell-free translation systems. Mol. Cell. Biol. 1989; 9:5073–5080. - PMC - PubMed
    1. Ingolia N.T., Lareau L.F., Weissman J.S.. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011; 147:789–802. - PMC - PubMed
    1. Lee S., Liu B., Lee S., Huang S.-X., Shen B., Qian S.-B.. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc. Natl. Acad. Sci. U.S.A. 2012; 109:14728–14729. - PMC - PubMed

Publication types