Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 11;9(4):412.
doi: 10.3390/biomedicines9040412.

Genetic Diversity of SARS-CoV-2 over a One-Year Period of the COVID-19 Pandemic: A Global Perspective

Affiliations

Genetic Diversity of SARS-CoV-2 over a One-Year Period of the COVID-19 Pandemic: A Global Perspective

Miao Miao et al. Biomedicines. .

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a global pandemic of coronavirus disease in 2019 (COVID-19). Genome surveillance is a key method to track the spread of SARS-CoV-2 variants. Genetic diversity and evolution of SARS-CoV-2 were analyzed based on 260,673 whole-genome sequences, which were sampled from 62 countries between 24 December 2019 and 12 January 2021. We found that amino acid (AA) substitutions were observed in all SARS-CoV-2 proteins, and the top six proteins with the highest substitution rates were ORF10, nucleocapsid, ORF3a, spike glycoprotein, RNA-dependent RNA polymerase, and ORF8. Among 25,629 amino acid substitutions at 8484 polymorphic sites across the coding region of the SARS-CoV-2 genome, the D614G (93.88%) variant in spike and the P323L (93.74%) variant in RNA-dependent RNA polymerase were the dominant variants on six continents. As of January 2021, the genomic sequences of SARS-CoV-2 could be divided into at least 12 different clades. Distributions of SARS-CoV-2 clades were featured with temporal and geographical dynamics on six continents. Overall, this large-scale analysis provides a detailed mapping of SARS-CoV-2 variants in different geographic areas at different time points, highlighting the importance of evaluating highly prevalent variants in the development of SARS-CoV-2 antiviral drugs and vaccines.

Keywords: COVID-19; SARS-CoV-2; genetic diversity; genetic variant; global pandemic.

PubMed Disclaimer

Conflict of interest statement

None of Erik De Clercq’s drugs [70,71] were designed or approved for coronavirus treatment. The authors declare no conflict of interests.

Figures

Figure 1
Figure 1
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome organization and the distribution of the SARS-CoV-2 genome sequence used in this study. (A) Genomic architecture of SARS-CoV-2 based on the reference sequence (Wuhan-Hu-1, NCBI accession NC_045512). (B) The temporal and geographic distribution of all sequences. All temporal analyses were based on the date of sequence collection.
Figure 2
Figure 2
Temporal distributions of the substitution rates of the six SARS-CoV-2 proteins. The proteins were the top six proteins with the highest substitution rates. (A) The global substitution rate curves of SARS-CoV-2 proteins based on a time sliding window. (B) The substitution rate curves of SARS-CoV-2 proteins based on a time sliding window on different continents. The vertical axis represents the moving-window substitution rate, calculated by dividing the total number of polymorphic sites of a protein contained in the sequences, 15 days before and after a specific date, by the total number of all positions of the protein in the period.
Figure 3
Figure 3
Distribution of variants at positions of SARS-CoV-2 proteins. For each site, the reference index is shown at the top, followed by variants with a frequency >1%. Variants highlighted with green superscripts were those with frequencies >5%.
Figure 4
Figure 4
Distribution of frequent variants across the SARS-CoV-2 proteins in different geographic areas. (A) Frequencies of the top 30 variants, with the highest variant frequencies in different continents. (B) Temporal and geographic dynamics of frequent SARS-CoV-2 variants. The vertical axis represents the moving-window variant frequency calculated by dividing the number of sequences containing a specific variant, 15 days before and after a specific date, by the total number of sequences in the period (31 days).
Figure 5
Figure 5
Variants in the SARS-CoV-2 proteins. (A) Site 222 (magenta) and Site 614 (red) of spike. Almost all sequences showed a variant (D614G). Site 614 is located at the interface between two subunits. (B) Site 323 (deep salmon) of the NSP12 protein. Many sequences showed a variant (P323L). (C) Site 203 (orange), Site 204 (hot pink), and Site 220 (lime green) of the nucleocapsid protein. The frequencies of variants R203K, G204R, and A220V were high. (D) Site 85 (light pink) of the NSP2 protein. Many sequences showed a variant (T85I). The structures of spike, NSP12, the nucleocapsid protein, and NSP2 were collected from https://zhanglab.ccmb.med.umich.edu/COVID-19/. The protein structural figures were generated by the software PyMOL (http://www.pymol.org/, the accessed date: 16 January 2021).
Figure 6
Figure 6
The substitution matrix at the nucleic acid level. Nucleotide substitutions showed in this figure are the main substitutions characterizing the SARS-CoV-2 clades.
Figure 7
Figure 7
Temporal distributions of the 12 clades based on 260,673 complete SARS-CoV-2 nucleotide sequences. (A) The global distribution of the 12 clades over time. (B) The distribution of the 12 clades over time on six continents. The vertical axis represents the moving-window proportion, calculated by dividing the number of sequences belonging to a specific clade, 15 days before and after a specific date, by the total number of sequences in the period (31 days).

Similar articles

Cited by

References

    1. Shang J., Han N., Chen Z., Peng Y., Li L., Zhou H., Ji C., Meng J., Jiang T., Wu A. Compositional diversity and evolutionary pattern of coronavirus accessory proteins. Briefings Bioinform. 2021;22:1267–1278. doi: 10.1093/bib/bbaa262. - DOI - PMC - PubMed
    1. Li G., De Clercq E. Therapeutic options for the 2019 novel coronavirus (2019-nCoV) Nat. Rev. Drug Discov. 2020;19:149–150. doi: 10.1038/d41573-020-00016-0. - DOI - PubMed
    1. Koyama T., Weeraratne D., Snowdon J.L., Parida L. Emergence of Drift Variants That May Affect COVID-19 Vaccine Development and Antibody Treatment. Pathogens. 2020;9:324. doi: 10.3390/pathogens9050324. - DOI - PMC - PubMed
    1. Andersen K.G., Rambaut A., Lipkin W.I., Holmes E.C., Garry R.F. The proximal origin of SARS-CoV-2. Nat. Med. 2020;26:450–452. doi: 10.1038/s41591-020-0820-9. - DOI - PMC - PubMed
    1. Zhou P., Yang X.-L., Wang X.-G., Hu B., Zhang L., Zhang W., Si H.-R., Zhu Y., Li B., Huang C.-L., et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–273. doi: 10.1038/s41586-020-2012-7. - DOI - PMC - PubMed

LinkOut - more resources