Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;89(6):341-356.
doi: 10.1007/s00239-021-10008-2. Epub 2021 May 15.

The Evolution of Severe Acute Respiratory Syndrome Coronavirus-2 during Pandemic and Adaptation to the Host

Affiliations

The Evolution of Severe Acute Respiratory Syndrome Coronavirus-2 during Pandemic and Adaptation to the Host

Snawar Hussain et al. J Mol Evol. 2021 Jul.

Abstract

Severe Acute Respiratory Syndrome Coronavirus-2 is a zoonotic virus with a possible origin in bats and potential transmission to humans through an intermediate host. When zoonotic viruses jump to a new host, they undergo both mutational and natural selective pressures that result in non-synonymous and synonymous adaptive changes, necessary for efficient replication and rapid spread of diseases in new host species. The nucleotide composition and codon usage pattern of SARS-CoV-2 indicate the presence of a highly conserved, gene-specific codon usage bias. The codon usage pattern of SARS-CoV-2 is mostly antagonistic to human and bat codon usage. SARS-CoV-2 codon usage bias is mainly shaped by the natural selection, while mutational pressure plays a minor role. The time-series analysis of SARS-CoV-2 genome indicates that the virus is slowly evolving. Virus isolates from later stages of the outbreak have more biased codon usage and nucleotide composition than virus isolates from early stages of the outbreak.

Keywords: COVID-19; Codon usage bias; Natural selection; Sever Acute Respiratory Syndrome Coronavirus-2; Virus evolution.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Fig. 1
Fig. 1
Whole genome comparative nucleotide composition analysis of SARS-CoV-2, SARS-CoV (SARS-CoV-Tor2), bat coronavirus (Bat-CoV-RaTG13) and bat SARS like coronaviruses (Bat SLCoV-ZXC21 and Bat SLCoV-ZXC21). a Boxplot diagram depicting the GC/AT(U) contents in coding and non-coding (NC) regions of SARS-CoV-2 genome. b, c Boxplot depicting the A/T(U) and G/C contents of Bat-CoV, Bat_SLCoV, SARS-CoV and SARS-CoV-2 genomes. d GC/AT(U) contents (mean ± SD) of SARS-CoV-2 genes
Fig. 2
Fig. 2
Heat map of the relative synonymous codon usage (RSCU). Each row represents a specie, and each column represents a codon. The higher the RSCU value, the more abundant the codon is in the sequence. Colors from yellow (lowest) to red (highest) indicate the magnitude of RSCU values
Fig. 3
Fig. 3
The principal component analysis of RSCU of SARS-CoV-2. a The scree plot of the eigenvalues of the first 40 PCs and cumulative variance plot from principal component analysis of the SARS-CoV-2 RSCU values. b A plot of the values of the Axis1(13.072%) and the Axis2 (9.11%) of all SARS-CoV-2 strains in principal component analysis
Fig. 4
Fig. 4
Codon usage bias (ENC) and Codon adaptation index (CAI) analyses of SARS-CoV-2. a CAI values of SARS-CoV-2 genes related Homo sapiens house-keeping genes and Rhinolophus affinis. b The ENC and GC3(%) values of SARS-CoV-2 genes
Fig. 5
Fig. 5
The effect of mutational bias and natural selection on SARS-CoV-2 synonymous codon usage. a Relationship between GC3 and the effective number of codons (ENC). The ENC values of SARS-CoV-2 Strains (concatenated coding sequence) and mean ENC values of individual SARS-CoV-2 genes (upper-right inset) were plotted against the corresponding GC3s. The standard curve (dotted line) indicates the expected codon usage if GC compositional constraints alone account for codon usage bias. b The neutrality plot (GC12 vs. GC3). Neutrality plot analysis of the average GC content in the first- and second-codon positions (GC12) and the GC content at third position (GC3)
Fig. 6
Fig. 6
Relative dinucleotide abundance in SARS-CoV-2 genome. Line graph represents the mean observed/expected (O/E) frequency ratio of 16 dinucleotides. The mean ± standard deviation of dinucleotide O/E ratios for SARS-CoV-2 coding sequence is 1.0 ± 0.235. Dotted box represents the confidence interval of mean ± 1SD (i.e., O/E ratio 0.765–1.235). Dinucleotide outside dotted box was classified as under- or over-represented in SARS-CoV-2 genome
Fig. 7
Fig. 7
Time-series change is SARS-CoV-2 ENC. a Averaged ENC value for the strains isolated in each week was plotted according to elapsed week from December 21, 2019. Trend lines were generated using linear regression analysis to facilitate visualization of correlations. b Boxplots of the effective number of codons (ENC) vs month of isolation. Asterisk (***) indicates that there was a significant difference (P < 0.001) between the two groups
Fig. 8
Fig. 8
Time-series change in mono-nucleotide compositions (%) for SARS-CoV-2. a–d Averaged mono-nucleotide compositions (%) for the strains isolated in each week were plotted according to elapsed week from December 21, 2019. The trend lines were generated using linear regression analysis to facilitate visualization of correlations. e, f Boxplot of %U and %C vs month of isolation. g Frequency of mono-nucleotide substitutions in SARS-CoV-2 genome
Fig. 9
Fig. 9
Time-series changes in the codon adaptation indices and dinucleotides compositions (O/E ratio) for SARS-CoV-2. a Averaged CAI and b averaged RCDI values for the strains isolated in each week. c, d Averaged O/E ratios of CpG and UpA dinucleotide for the strains isolated in each week were plotted according to elapsed week from December 21, 2019. The trend lines were generated using linear regression analysis to facilitate visualization of correlations

Similar articles

Cited by

References

    1. Athey J, Alexaki A, Osipova E, et al. A new and updated resource for codon usage tables. BMC Bioinform. 2017;18(1):391. - PMC - PubMed
    1. Atkinson NJ, Witteveldt J, Evans DJ, Simmonds P. The influence of CpG and UpA dinucleotide frequencies on RNA virus replication and characterization of the innate cellular pathways underlying virus attenuation and enhanced replication. Nucleic Acids Res. 2014;42(7):4527–4545. - PMC - PubMed
    1. Bai HX, Hsieh B, Xiong Z, et al. Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT. Radiology. 2020 doi: 10.1148/radiol.2020200823. - DOI - PMC - PubMed
    1. Berkhout B, van Hemert F. On the biased nucleotide composition of the human coronavirus RNA genome. Virus Res. 2015;202:41–47. - PMC - PubMed
    1. Berry M, Gamieldien J, Fielding BC. Identification of new respiratory viruses in the new millennium. Viruses. 2015;7(3):996–1019. - PMC - PubMed

Publication types