Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug;10(23):e2205445.
doi: 10.1002/advs.202205445. Epub 2023 Jun 2.

Optimization and Deoptimization of Codons in SARS-CoV-2 and Related Implications for Vaccine Development

Affiliations

Optimization and Deoptimization of Codons in SARS-CoV-2 and Related Implications for Vaccine Development

Xinkai Wu et al. Adv Sci (Weinh). 2023 Aug.

Abstract

The spread of coronavirus disease 2019 (COVID-19), caused by severe respiratory syndrome coronavirus 2 (SARS-CoV-2), has progressed into a global pandemic. To date, thousands of genetic variants have been identified among SARS-CoV-2 isolates collected from patients. Sequence analysis reveals that the codon adaptation index (CAI) values of viral sequences have decreased over time but with occasional fluctuations. Through evolution modeling, it is found that this phenomenon may result from the virus's mutation preference during transmission. Using dual-luciferase assays, it is further discovered that the deoptimization of codons in the viral sequence may weaken protein expression during virus evolution, indicating that codon usage may play an important role in virus fitness. Finally, given the importance of codon usage in protein expression and particularly for mRNA vaccines, it is designed several codon-optimized Omicron BA.2.12.1, BA.4/5, and XBB.1.5 spike mRNA vaccine candidates and experimentally validated their high levels of expression. This study highlights the importance of codon usage in virus evolution and provides guidelines for codon optimization in mRNA and DNA vaccine development.

Keywords: SARS-CoV-2; codon optimization; codon usage bias; spike protein; synonymous mutations; vaccine design.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The evolutionary trends in codon adaptation index (CAI, major viral lineages, and synonymous mutation types during SARS‐CoV‐2 evolution. a) The CAI changes in SARS‐CoV‐2 caused by both synonymous and nonsynonymous (the solid red line and shadow) or synonymous mutations only (the solid blue line and shadow) over time. The red and blue solid lines indicate the median CAI, and the red and blue shadows indicate the 95% interval of CAI in a 14‐day sliding window with a one‐day step. The black dashed lines indicate when the World Health Organization defined the Alpha, Delta, and Omicron lineages as variants of concern (VOCs). b) Prevalence of VOCs and variants of interest (VOIs) over time. The proportion of variants of SARS‐CoV‐2 in a 14‐day sliding window with a one‐day step. c) The proportions of synonymous mutations of different substitutional types in SARS‐CoV‐2 in a 14‐day sliding window with a one‐day step.
Figure 2
Figure 2
Deoptimization of codons by C>U substitutional bias in SARS‐CoV‐2. a) Types of the top 30 most frequently observed codon substitutions. Red denotes an increase in the sequence's CAI, whereas blue suggests a decrease because of the substitution. b) The number of pre‐ or post‐outbreak synonymous codon changes. The x‐axis indicates the number of pre‐outbreak codon changes from the ancestor to SARS‐CoV‐2 and RaTG13. The y‐axis indicates the number of codon changes that occurred during the pandemic. A blue dot indicates that a nonpreferred codon was replaced by a preferred codon (optimization). A red dot indicates that a preferred codon was replaced by an unpreferred codon (deoptimization). The red x shapes indicate the codon deoptimization caused by C>U mutations. The ten synonymous codon changes that occurred more frequently during the pandemic than in the preoutbreak history (Fisher's exact test, P‐adj < 0.05) are labeled, and the associated dots or x shapes are larger.
Figure 3
Figure 3
Simulation of viral sequences under biased mutation and different selections. a) The C>U synonymous mutations had significantly higher DAFs than the other types of synonymous mutations. b) The C>U synonymous changes had a significantly higher maximum DAF than the other types of synonymous changes. c) The DAF ratio of C>U mutations compared to non‐C>U mutations in the viral population under neutral (grey), positive selection (purple), and purifying selection (black) conditions during the simulation process. For each simulation replicate, the mean DAF values for C>U and non‐C>U mutations were computed independently at each generation. Subsequently, the ratio of the DAF value for C>U mutations relative to non‐C>U mutations at each generation was determined. The median and the 95% quantiles of the ratio were calculated based on 100 repetitions. d–f) The allele frequency spectrum of C>U and other types of mutations in the 150th generation under neutral or selective conditions. Mutations were grouped according to their observed number in the population, and the number of mutations in each group was tallied. The mean number of mutations in each group across 100 independent simulations was calculated and graphed, with error bars indicating standard deviation.
Figure 4
Figure 4
Effects of synonymous viral mutations on protein synthesis rate. a) The design of the dual‐luciferase reporter assays. By inserting a 240‐nt or 243‐nt viral CDS (centered on the mutation) after the Renilla start codon, two reporter plasmids (WT and MUT) were created. The function of the mutation was inferred by comparing the protein expression levels of MUT and WT. b) The correlations between protein expression changes and codon usage change by synonymous mutations. The y‐axis is the fold change (log2) of protein production of the mutant allele relative to that in the wild‐type allele. The x‐axis represents the change in w after codon substitution between a total of 70 pairs of WTS and MUT. The mean and standard errors of the change in protein expression level are presented for each mutation. Points are marked in red or blue when the relative intensity of the mutant allele is significantly higher or lower than that of the wild‐type allele (one‐sided t‐test, p < 0.05) and in gray if there is no significant difference. c) The relationship between changes in mRNA levels following a mutation and alterations in codon usage. The fold change (log2) of mRNA levels post‐mutation to pre‐mutation is plotted on the y‐axis. d) Translation efficiency changes resulting from alterations in protein and mRNA levels after a mutation (log2) were calculated. The top 12 mutations with the greatest increase in fluorescence level after the mutation are denoted by magenta dots, while the top 12 mutations with the greatest decrease in fluorescence level are denoted by green dots. Standard error bars are included, and the black solid line represents the fitted mean for each mutation.
Figure 5
Figure 5
Experimental validation of codon optimization of the S gene. a) The distribution of CAI values of human genes and the partially or fully optimized codons for the S gene of SARS‐CoV‐2. For the S gene of BA.2.12.1, BA.4/5, and XBB.1.5 variants in the Omicron lineage, s1 represents the native viral CDS sequence, s2, s3, s4 and BA.4/5‐s5 represent the sequences with different CAI optimization levels, respectively, BA.2.12.1‐s5, BA.4/5‐s6, and XBB.1.5‐s5 represent the fully optimized CDS sequence that had a CAI value of 1. b) Western blotting analysis of S protein expression in HEK293T cells. The bands of full‐length and cleaved S protein are labeled. All the samples were probed using polyclonal rabbit anti‐SARS‐CoV‐2 S antibody (40591‐T62, Sino Biological) at a dilution of 1:2000. c) Correlation analysis between CAI value and S protein expression level. Experiments were repeated three times, error bars indicate stand error of the mean.

Similar articles

Cited by

References

    1. Plotkin J. B., Kudla G., Nat. Rev. Genet. 2011, 12, 32. - PMC - PubMed
    1. a) King J. L., Jukes T. H., Science 1969, 164, 788; - PubMed
    2. b) Kimura M., Genet. Res. 1968, 11, 247. - PubMed
    1. a) Shen X., Song S., Li C., Zhang J., Nature 2022, 606, 725; - PMC - PubMed
    2. b) Chen F., Wu P., Deng S., Zhang H., Hou Y., Hu Z., Zhang J., Chen X., Yang J. R., Nat. Ecol. Evol. 2020, 4, 589; - PMC - PubMed
    3. c) Nieuwkoop T., Finger‐Bou M., van der Oost J., Claassens N. J., Mol. Cell 2020, 80, 193; - PubMed
    4. d) Lu J., Wu C. I., Proc. Natl. Acad. Sci. U S A 2005, 102, 4063. - PMC - PubMed
    1. a) Yan X., Hoek T. A., Vale R. D., Tanenbaum M. E., Cell 2016, 165, 976; - PMC - PubMed
    2. b) Lyu X., Yang Q., Zhao F., Liu Y., Nucleic Acids Res. 2021, 49, 9404. - PMC - PubMed
    1. a) Presnyak V., Alhusaini N., Chen Y. H., Martin S., Morris N., Kline N., Olson S., Weinberg D., Baker K. E., Graveley B. R., Coller J., Cell 2015, 160, 1111; - PMC - PubMed
    2. b) Wu Q., Medina S. G., Kushawah G., DeVore M. L., Castellano L. A., Hand J. M., Wright M., Bazzini A. A., Elife 2019, 8, e45396. - PMC - PubMed