Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 19;10(1):14004.
doi: 10.1038/s41598-020-70812-6.

Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity

Affiliations

Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity

M Rafiul Islam et al. Sci Rep. .

Erratum in

Abstract

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), a novel evolutionary divergent RNA virus, is responsible for the present devastating COVID-19 pandemic. To explore the genomic signatures, we comprehensively analyzed 2,492 complete and/or near-complete genome sequences of SARS-CoV-2 strains reported from across the globe to the GISAID database up to 30 March 2020. Genome-wide annotations revealed 1,516 nucleotide-level variations at different positions throughout the entire genome of SARS-CoV-2. Moreover, nucleotide (nt) deletion analysis found twelve deletion sites throughout the genome other than previously reported deletions at coding sequence of the ORF8 (open reading frame), spike, and ORF7a proteins, specifically in polyprotein ORF1ab (n = 9), ORF10 (n = 1), and 3´-UTR (n = 2). Evidence from the systematic gene-level mutational and protein profile analyses revealed a large number of amino acid (aa) substitutions (n = 744), demonstrating the viral proteins heterogeneous. Notably, residues of receptor-binding domain (RBD) showing crucial interactions with angiotensin-converting enzyme 2 (ACE2) and cross-reacting neutralizing antibody were found to be conserved among the analyzed virus strains, except for replacement of lysine with arginine at 378th position of the cryptic epitope of a Shanghai isolate, hCoV-19/Shanghai/SH0007/2020 (EPI_ISL_416320). Furthermore, our results of the preliminary epidemiological data on SARS-CoV-2 infections revealed that frequency of aa mutations were relatively higher in the SARS-CoV-2 genome sequences of Europe (43.07%) followed by Asia (38.09%), and North America (29.64%) while case fatality rates remained higher in the European temperate countries, such as Italy, Spain, Netherlands, France, England and Belgium. Thus, the present method of genome annotation employed at this early pandemic stage could be a promising tool for monitoring and tracking the continuously evolving pandemic situation, the associated genetic variants, and their implications for the development of effective control and prophylaxis strategies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Genomic deletion analysis of SARS-CoV-2. Genomic deletion analysis of SARS-CoV-2 strains identified (a) 24 (nt) deletions in NSP1 in a Japanese strain; (b) 15-nt deletions in NSP1 of viral strains from USA, Japan and the Netherlands alongside three-nt deletions in USA and Netherlands; (c) three-nt deletion in NSP1 of American strains and very adjacent to that, nine-nucleotide deletion of strains from the USA, England and Canada and Iceland; (d) three-nucleotide deletions in NSP2 were observed in 99 strains from Netherlands, England, Portugal, Slovakia, Iceland, Wales, France and New Zealand (representatives from each countries were shown); (e) NSP8 undergoes three-nt deletion in Netherlandian/dutch/hollanders strains; (f) three-nucleotide deletion in NSP15 of USA strain; (g-g1) 35nt deletion, including start codon position of ORF10 of Spain strains, and the start codon in spacer position, has been used for ORF10 coding; and as a result, (g-g2) five aa residues deletion in those strains starting from position 1 to 5. Deletion of (h) 29-nt reported from Wuhan, and (i) 10-nt in 3′-UTR of strains belonged to Australia. The position of nt represents the starting position from each ORFs, for instance, position of ORF1ab was considered for the NSPs. MAFFT online tool was used for alignment, and Unipro-UGENE used for visualization.
Figure 2
Figure 2
Amino-acid (aa) residues heterogeneity in (a) S, (b) E and (c) M proteins of SARS-CoV-2. The Fingerprint protein analysis showed that aa residues in S, M and E proteins varied due to change and/or substitutions in their positions.
Figure 3
Figure 3
The frequency spectra of amino-acid mutations in 2,492 SARS-CoV-2 complete genome sequences, and its impact on mortality rates. Amino-acid (aa) mutations found in the open reading frames (ORFs) of the SARS-CoV-2 genomes according to (a) geographic areas and (b) different climate zones. We found six core shared mutations (at residue position of R203K, G204R, G251V, L3606F, P4714L, D614G) in Asia, Europe, Africa, Australia, North America, and South America, and four core shared mutations (at residue position of Q57H, D614G, L3606F, and P4714L) in continental, diverse, dry, tropical and temperate conditions. In both cases (a and b), yellow circles represent frequency of aa substitutions shared by all categories, and the frequency of aa substitutions shared by at least two continents/climate zones are shown in green circle, while the pink colored ribbons indicate unique aa mutations in each individual regional and climate zone. (c) Estimated case fatality rates of SARS-CoV infections in 45 countries of the globe according to climatic variations. Numbers in parentheses indicate total confirmed cases/total deaths according to the World Health Organization (WHO) as of 30th March, 2020. Organization (WHO) as of 30th March, 2020. Data were finally visualized using custom Venn diagrams online tool (https://bioinformatics.psb.ugent.be/webtools/Venn/) along with illustrator CC 2019 trial version.

Similar articles

Cited by

References

    1. Zhu N, et al. A novel coronavirus from patients with pneumonia in China, 2019. N. Eng. J. Med. 2020;382(8):727–733. doi: 10.1056/NEJMoa2001017. - DOI - PMC - PubMed
    1. Cotten M, et al. Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study. Lancet. 2013;382(9909):1993–2002. doi: 10.1016/S0140-6736(13)61887-5. - DOI - PMC - PubMed
    1. Walls AC, et al. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. 2020;180:1–12. doi: 10.1016/j.cell.2019.12.001. - DOI - PMC - PubMed
    1. Ahmed SF, Quadeer AA, McKay MR. Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies. Viruses. 2020;12(3):254. doi: 10.3390/v12030254. - DOI - PMC - PubMed
    1. Phan T. Genetic diversity and evolution of SARS-CoV-2. Infect. Genet. Evol. 2020;81:104260. doi: 10.1016/j.meegid.2020.104260. - DOI - PMC - PubMed

MeSH terms

Substances