Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 24;10(1):15643.
doi: 10.1038/s41598-020-72533-2.

Sequence analysis of SARS-CoV-2 genome reveals features important for vaccine design

Affiliations

Sequence analysis of SARS-CoV-2 genome reveals features important for vaccine design

Jacob Kames et al. Sci Rep. .

Abstract

As the SARS-CoV-2 pandemic is rapidly progressing, the need for the development of an effective vaccine is critical. A promising approach for vaccine development is to generate, through codon pair deoptimization, an attenuated virus. This approach carries the advantage that it only requires limited knowledge specific to the virus in question, other than its genome sequence. Therefore, it is well suited for emerging viruses, for which we may not have extensive data. We performed comprehensive in silico analyses of several features of SARS-CoV-2 genomic sequence (e.g., codon usage, codon pair usage, dinucleotide/junction dinucleotide usage, RNA structure around the frameshift region) in comparison with other members of the coronaviridae family of viruses, the overall human genome, and the transcriptome of specific human tissues such as lung, which are primarily targeted by the virus. Our analysis identified the spike (S) and nucleocapsid (N) proteins as promising targets for deoptimization and suggests a roadmap for SARS-CoV-2 vaccine development, which can be generalizable to other viruses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Codon frequencies per 1,000 for SARS-CoV-2 (Red), Homo sapiens Genomic (Black) and Homo sapiens Lung (Yellow). Codons are grouped by the amino acid they encode (alternating light blue columns, Met (M) and Trp (W) represented as single letter).
Figure 2
Figure 2
Heat maps of log transformed codon pair frequencies per 1 M for Homo sapiens Genomic (A), SARS-CoV-2 (B) and the absolute value of difference between the two (C). Codon pairs increase in frequency from dark to light.
Figure 3
Figure 3
Dinucleotide (A) and junction dinucleotide (B) frequencies per 1,000 for SARS-CoV-2 (Red), Homo sapiens Genomic (Black) and Homo sapiens Lung (Yellow).
Figure 4
Figure 4
(A) The predicted minimum free energy (MFE) secondary structure of the novel coronavirus RNA in the 75 nts following the frameshift. All MFE structures displayed are those predicted by LandscapeFold; results discussed were found to be insensitive to prediction algorithm by comparison to NuPack. (B,C) Known coronaviruses with high degree of sequence and structure similarity to the novel coronavirus. (DH) Known coronaviruses with a high degree of structure similarity to the novel coronavirus, but less sequence similarity. See main text for further discussion. (I) In addition to examining the predicted MFE structures, we considered the full free-energy landscapes. The probability of each coronavirus to form a pseudoknot in the 75 nts following the frameshift (orange), and the probability of the first stem to be part of a 3-stem pseudoknot (blue), are histogrammed.
Figure 5
Figure 5
Scatterplots of RSCU bias [average ln(RSCU)] (A) and codon pair bias (CPB) (B) by CDS length of human and viral genes. Human genes appear as grey dots and viral genes appear with different colored markers.
Figure 6
Figure 6
Seven codon sliding window average of ln(RSCU) (A) and codon pair score (CPS) (B) of structural SARS-CoV-2 genes. Genes are shown in the order they appear in the viral genome, but gaps between open reading frames have been removed. Genes alternate in colors black and blue for clarity, with the gene name in the corresponding color appearing above or below the window. RSCU and CPS are calculated based on Homo sapiens genomic codon and codon pair usage.

Update of

References

    1. Coronavirus disease 2019 (COVID-19) Situation Report–140. https://www.who.int/docs/default-source/coronaviruse/situation-reports/2... (World Health Organization, Geneva, 2020).
    1. Coleman JR, et al. Virus attenuation by genome-scale changes in codon pair bias. Science. 2008;320:1784–1787. doi: 10.1126/science.1155761. - DOI - PMC - PubMed
    1. Coleman JR, Papamichail D, Yano M, Garcia-Suarez MDM, Pirofski LA. Designed reduction of Streptococcus pneumoniae pathogenicity via synthetic changes in virulence factor codon-pair bias. J. Infect. Dis. 2011;203:1264–1273. doi: 10.1093/infdis/jir010. - DOI - PMC - PubMed
    1. Athey J, et al. A new and updated resource for codon usage tables. BMC Bioinform. 2017;18:391. doi: 10.1186/s12859-017-1793-7. - DOI - PMC - PubMed
    1. Alexaki A, et al. Codon and codon-pair usage tables (CoCoPUTs): Facilitating genetic variation analyses and recombinant gene design. J. Mol. Biol. 2019;431:2434–2441. doi: 10.1016/j.jmb.2019.04.021. - DOI - PubMed

Publication types

MeSH terms