Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Mar:289:114032.
doi: 10.1016/j.jviromet.2020.114032. Epub 2020 Dec 5.

Genomic and evolutionary comparison between SARS-CoV-2 and other human coronaviruses

Affiliations
Comparative Study

Genomic and evolutionary comparison between SARS-CoV-2 and other human coronaviruses

Zigui Chen et al. J Virol Methods. 2021 Mar.

Abstract

Three highly pathogenic human coronaviruses can cause severe acute respiratory syndrome (SARS-CoV, SARS-CoV-2 and MERS-CoV). Although phylogenetic analyses have indicated ancient origin of human coronaviruses from animal relatives, their evolutionary history remains to be established. Using phylogenetics and "high order genomic structures" including trimer spectrums, codon usage and dinucleotide suppression, we observed distinct clustering of all human coronaviruses that formed phylogenetic clades with their closest animal relatives, indicating they have encompassed long evolutionary histories within specific ecological niches before jumping species barrier to infect humans. The close relationships between SARS-CoV and SARS-CoV-2 imply similar evolutionary origin. However, a lower Effective Codon Number (ENC) pattern and CpG dinucleotide suppression in SARS-CoV-2 genomes compared to SARS-CoV and MERS-CoV may imply a better host fitness, and thus their success in sustaining a pandemic. Characterization of coronavirus heterogeneity via complementary approaches enriches our understanding on the evolution and virus-host interaction of these emerging human pathogens while the underlying mechanistic basis in pathogenicity warrants further investigation.

Keywords: COVID-19; Codon usage; Dinucleotide suppression; MERS-CoV; Phylogeny; SARS-CoV; SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests. PC is not involved in the review of this manuscript.

Figures

Fig. 1
Fig. 1
Phylogeney of the family Coronaviridae. A maximum likelihood (ML) tree was contructed using RAxML MPI v8.2.12 inferred from the concatenated nucleotide sequence alignments of 6 open reading frames (1a-1b-S-E-M-N) of 55 reference genomes. The dot size on the nodes is proportional to the bootstrap support values. The HCoV clusters associated with severe acute respiratory syndrome (SARS-CoV, SARS-CoV-2 and MERS-CoV) and common cold (HCoV-OC43, HCoV-HKU1, HCoV-229E and HCoV-NL63) were highlighted in red and orange, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).
Fig. 2
Fig. 2
Phylogeny of the subgenus Sarbecovirus in the genus Betacoronavirus. (A) A maximum likelihood (ML) tree was constructed using RAxML MPI v8.2.12 inferred from the concatenated nucleotide sequence alignments of 12 open reading frames (1a-1b-S-3a-E-M-6-7a-7b-8-N-10) of 114 genomes. The percent nucleotide differences are shown in the panel to the right of the phylogeny. Values for each comparison of a given isolate are connected by lines and the comparison to self is indicated by the 0.0 % difference point. Coloured lines are used to distinguish SARS-CoV-1 and SARS-CoV-2 clusters. (B) Tanglegram of tree topologies between the hierarchical clustering. Trimer spectrum and maximum likelihood of 114 Sarbecovirus genomes inferred from the concatenated nucleotide sequences of 12 ORF/genes. The bar to the side of each panel indicates the subgenus assignment as coloured according to the key in the figure.
Fig. 3
Fig. 3
Synonymous codon usage of coronavirus genomes based on concatenated nucleotide sequences of 6 ORFs (ORF1a-1b-S-E-M-N). (A) Boxplot of Effective Number of Codon (ENC) between HCoV clusters. The ENC values range from 20 when a gene is effectively using only a single codon for each amino acid (strongest bias) to 61 when a gene trends to use all codons with equal frequency (no bias). (B) Plot of ENC and the synonymous third codon position (GC3s) content. The red curve indicates the expected ENC* if codon usage pattern is only affected by GC3s. (C) Boxplot of differences between the observed and expected ENC values among HCoV clusters. (D) Mean values of Relative Synonymous Codon Usage (RSCU) for 59 codons (except for Met, Trp, and stop codons) amongst HCoV clusters. The preferred and suppressed codon usages were defined as RSCU values > 1.6 or < 0.6, respectively. (E) Scatter biplot of RSCU of HCoV clusters. The clustering was performed using redundancy analysis (RDA), with colours assigned to different human betacoronavirus clusters. The x-axis and the y-axis represent the first two principal coordinate component (PCoA) axes. For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).
Fig. 4
Fig. 4
Scatter biplot of Relative Synonymous Codon Usage (RSCU) of HCoV clusters inferred from distinct ORF/gene. The clustering was performed based on RSCU patterns for individual gene using redundancy analysis (RDA), with colours assigned to different coronavirus clusters. The x-axis and the y-axis represent the first two principal coordinate component (PCoA) axes.
Fig. 5
Fig. 5
Dinucleotide suppression of HCoV genomes inferred from the concatenated nucleotide sequences of 6 ORFs (ORF1a-1b-S-E-M-N). (A) Boxplot of dinucleotide observed/expected (O/E) ratio. The ρXY dinucleotide exhibits suppression if the O/E ratio is less than 1. (B) Scatter biplot of relative abundance of dinucleotides of HCoV genomes. The clustering was performed using redundancy analysis (RDA), with colours assigned to different clusters. The x-axis and the y-axis represent the first two principal coordinate component (PCoA) axes. (C) Boxplot of the O/E ratios of each dinucleotide amongst HCoV clusters.

References

    1. Beutler E., Gelbart T., Han J.H., Koziol J.A., Beutler B. Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. Proc Natl Acad Sci U S A. 1989;86:192–196. - PMC - PubMed
    1. Bulmer M. Coevolution of codon usage and transfer RNA abundance. Nature. 1987;325:728–730. - PubMed
    1. Calderon B.M., Danzy S., Delima G.K., Jacobs N.T., Ganti K., Hockman M.R., Conn G.L., Lowen A.C., Steel J. Dysregulation of M segment gene expression contributes to influenza A virus host restriction. PLoS Pathog. 2019;15 - PMC - PubMed
    1. Chan C.X., Ragan M.A. Next-generation phylogenomics. Biol. Direct. 2013;8:3. - PMC - PubMed
    1. Charif D., Lobry J.R. Structural Approaches to Sequence Evolution. Springer; 2007. SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis; pp. 207–232.

Publication types

LinkOut - more resources