Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 21;3(6):563-571.e6.
doi: 10.1016/j.cels.2016.11.004.

RNA Structural Determinants of Optimal Codons Revealed by MAGE-Seq

Affiliations

RNA Structural Determinants of Optimal Codons Revealed by MAGE-Seq

Eric D Kelsic et al. Cell Syst. .

Abstract

Synonymous codon choices at the beginning of genes optimize 5' RNA structures for enhanced translation initiation, but less is known about mechanisms that drive codon optimization downstream within the gene. To understand what determines codon choices across a gene, we generated 12,726 in situ codon mutants in the Escherichia coli essential gene infA and measured their fitness by combining multiplex automated genome engineering mutagenesis with amplicon deep sequencing (MAGE-seq). Correlating predicted 5' RNA structure with fitness revealed that codons even far from the start of the gene are deleterious if they disrupt the native 5' RNA conformation. These long-range structural interactions generate context-dependent rules that constrain codon choices beyond intrinsic codon preferences. Genome-wide RNA folding predictions confirm that natural codon choices far from the start codon are optimized in part to prevent disruption of native structures near the 5' UTR. Our results shed light on natural codon distributions and should improve engineering of gene expression for synthetic biology applications.

Keywords: RNA structure; codon; codon optimization; codon usage; computational biology; molecular biology; synthetic biology; systems biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Systematically generating and measuring fitness of all single-codon substitutions across infA using MAGE-seq
(A) MAGE oligos for creating all single-codon mutants scanning along infA on the E. coli chromosome. (B) Mutants were pooled and competed in continuous exponential growth. Samples were taken at every population doubling and mutant frequencies were measured using deep sequencing of PCR amplicons. (C) Mutant fitness is calculated from the slope of best-fit lines tracking mutant abundance relative to the wild-type allele over time. Dotted line shows the dilution rate, which is the expected slope for non-growing cells. Error-bars are 2 s.d.
Figure 2
Figure 2. Context-dependent codon preferences of infA are strongest at the beginning of the gene
(A) Fitness of all single-codon mutants of infA in minimal media (Figure S3 shows rich media). The optimal codon for each amino acid varies with position, indicating context dependence. Circles indicate WT codons; horizontal lines separate synonymous codons. Background color indicates average effect of each amino acid substitution (faa) while colored X’s indicate synonymous fitness deviations (fsyn). For clarity we remove X’s for start and stop codons and set fsyn to zero in the later gene region for the most deleterious mutants (positions 10–71, faa < 0.75), which have higher measurement error (fstd). Larger X’s indicate more significant Z-scores (fsyn/fstd). (B) Intrinsic codon preferences averaged over later gene regions (f*syn, positions 10–71), error-bars are 2 s.e.m. Lower panel shows correlation between intrinsic codon preferences and the tRNA Adaptation Index (tAI). (C) Comparison of standard deviation of fitness for amino acid substitutions (σaa) and synonymous deviations (σsyn, mutants with faa < 0.75 not included in calculation). Synonymous codon preferences are strongest at the beginning of the gene (positions 1–9).
Figure 3
Figure 3. Analysis of codon-pair interactions near the start of the gene reveals deleterious effects of frame-shifted start codons
(A) MAGE oligos for creating all codon-pair mutants at positions 1–2 and 2–3. (B) Average fitness of codon pair mutants with in-frame ATG start codons (grey bars) versus frame-shifted start codons (colored bars), relative to mean library fitness (fo). Error bars are 2 s.e.m.
Figure 4
Figure 4. Correlations between RNA base-pairing and fitness reveal beneficial and deleterious RNA hairpin structures
(A) Correlation matrix Rij. Each base-pairing location is colored by the correlation between predicted base-pair binding probabilities (Pij) and codon-pair mutant fitness (f); white indicates base-pairings that do not form for any mutants, red indicates positive correlation with fitness and blue indicates negative correlation with fitness. Insets show examples of base-pairings with positive and negative correlations with fitness. Dotted gray box surrounds the approximate location of beneficial base-pairings, while the solid gray box surrounds base-pairings near the step loop as shown in panel C. (B) Average of the base-pairing correlations within RNA hairpins (vertical averages of panel A). The preferred RNA configurations are hairpins centered upstream from the start of the gene (near position –18nt). (C) Analysis of deleterious and compensatory mutations within the presumed beneficial RNA hairpin, showing mutated positions on top of the predicted minimum free energy RNA structure of the WT allele. Deleterious mutations near the step loop of the hairpin can be compensated by mutations on the opposite side of the hairpin that restore base-pairing. Gray numbers 1–4 indicate positions of introduced mutations. Circles along the diagonal mark locations with perfect base-pairing, triangles mark locations with 1 mismatch and a black dot marks the WT 5′ UTR sequence. Insets show average fitness for 0–2 mismatches. Error bars are 2 s.e.m.
Figure 5
Figure 5. An RNA Configuration Score explains fitness better than RNA folding energy and other metrics
(A) Scatter plot of mutant minimum free energy (mfe) and RNA Configuration Score (RCS) for codon-pair mutants. Each point is colored by mutant fitness; contour lines are a best-fit regression; orange circles mark example mutants with comparable minimum free energy but bad versus good RCS, with example mutant RNA structures shown above: a green line surrounds the start codon; each RNA base-pair is colored by the correlation of its base-pairing probability with fitness as in Figure 4A. The dashed box surrounds the beneficial hairpin region mutated in Figure 4C (B) Fitness variance explained by frame-shifted start and stop codons, Shine-Dalgarno-like sequences (SD-like), minimum free energy, RCS and linear combinations of these metrics.
Figure 6
Figure 6. Preservation of RNA structure at the beginning of the gene determines context-dependent codon preferences throughout the gene
(A) Correlation of synonymous fitness deviations with tAI, f*syn and synonymous RCS deviations, for the full gene and for later regions (positions 10–71). (B) RCS correlations as in panel A, calculated for a sliding window of 10 codons centered at each position. P-values show probability of measuring R >Robs based on a null model of shuffling synonymous codons within amino acids. (C) RCS for single-codon mutants throughout the gene. Black dots show non-synonymous single-codon mutants, pink dots show synonymous single-codon mutants, and a red line connects the wild-type codons. Wild-type codons are near optimal with respect to RCS up until codon 66 (dashed line). (D) Fraction of E. coli genes for which WT codons preserve the 5′ UTR configuration better than the median null allele, with 10 codons on either side of the indicated position being synonymously randomized (for position 10 we use only the WT 5′ UTR for the earlier region of the RNA, see STAR Methods). Fractions greater than 0.5 indicate genome-wide enrichment for WT codons that preserve these upstream RNA structures. Error bars are 2 s.e.m.

Comment in

  • Codon Clarity or Conundrum?
    Aalberts DP, Boël G, Hunt JF. Aalberts DP, et al. Cell Syst. 2017 Jan 25;4(1):16-19. doi: 10.1016/j.cels.2017.01.004. Cell Syst. 2017. PMID: 28125789 Free PMC article.

References

    1. Agashe D, Martinez-Gomez NC, Drummond DA, Marx CJ. Good Codons, Bad Transcript: Large Reductions in Gene Expression and Fitness Arising from Synonymous Mutations in a Key Enzyme. Molecular Biology and Evolution. 2013;30:549–560. - PMC - PubMed
    1. Agashe D, Sane M, Phalnikar K, Diwan GD, Habibullah A, Martinez-Gomez NC, Sahasrabuddhe V, Polachek W, Wang J, Chubiz LM, et al. Large-Effect Beneficial Synonymous Mutations Mediate Rapid and Parallel Adaptation in a Bacterium. Molecular Biology and Evolution. 2016;33:1542–1553. - PMC - PubMed
    1. Bentele K, Saffert P, Rauscher R, Ignatova Z, thgen NBU. Efficient translation initiation dictates codon usage at gene start. Mol Syst Biol. 2013;9:1–10. - PMC - PubMed
    1. Boël G, Letso R, Neely H, Price WN, Wong KH, Su M, Luff JD, Valecha M, Everett JK, Acton TB, et al. Codon influence on protein expression in E. coli correlates with mRNA levels. Nature. 2016;529:358–363. - PMC - PubMed
    1. Boucher JI, Cote P, Flynn J, Jiang L, Laban A, Mishra P, Roscoe BP, Bolon DNA. Viewing Protein Fitness Landscapes Through a Next-Gen Lens. Genetics. 2014;198:461–471. - PMC - PubMed

LinkOut - more resources