Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome

Jun-Sub Kim¹, Jun-Hyeong Jang¹, Jeong-Min Kim¹, Yoon-Seok Chung¹, Cheon-Kwon Yoo², Myung-Guk Han¹

Affiliations

¹ Division of Viral Diseases, Center for Laboratory Control of Infectious Diseases, Korea Centers for Disease Control and Prevention, Cheongju, Korea.
² Center for Laboratory Control of Infectious Diseases, Korea Centers for Disease Control and Prevention, Cheongju, Korea.

PMID: 32528815
PMCID: PMC7282418
DOI: 10.24171/j.phrp.2020.11.3.05

Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome

Jun-Sub Kim et al. Osong Public Health Res Perspect. 2020 Jun.

. 2020 Jun;11(3):101-111.

doi: 10.24171/j.phrp.2020.11.3.05.

Authors

Jun-Sub Kim¹, Jun-Hyeong Jang¹, Jeong-Min Kim¹, Yoon-Seok Chung¹, Cheon-Kwon Yoo², Myung-Guk Han¹

Affiliations

¹ Division of Viral Diseases, Center for Laboratory Control of Infectious Diseases, Korea Centers for Disease Control and Prevention, Cheongju, Korea.
² Center for Laboratory Control of Infectious Diseases, Korea Centers for Disease Control and Prevention, Cheongju, Korea.

PMID: 32528815
PMCID: PMC7282418
DOI: 10.24171/j.phrp.2020.11.3.05

Abstract

Objectives: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China, in December 2019 and has been rapidly spreading worldwide. Although the causal relationship among mutations and the features of SARS-CoV-2 such as rapid transmission, pathogenicity, and tropism, remains unclear, our results of genomic mutations in SARS-CoV-2 may help to interpret the interaction between genomic characterization in SARS-CoV-2 and infectivity with the host.

Methods: A total of 4,254 genomic sequences of SARS-CoV-2 were collected from the Global Initiative on Sharing all Influenza Data (GISAID). Multiple sequence alignment for phylogenetic analysis and comparative genomic approach for mutation analysis were conducted using Molecular Evolutionary Genetics Analysis (MEGA), and an in-house program based on Perl language, respectively.

Results: Phylogenetic analysis of SARS-CoV-2 strains indicated that there were 3 major clades including S, V, and G, and 2 subclades (G.1 and G.2). There were 767 types of synonymous and 1,352 types of non-synonymous mutation. ORF1a, ORF1b, S, and N genes were detected at high frequency, whereas ORF7b and E genes exhibited low frequency. In the receptor-binding domain (RBD) of the S gene, 11 non-synonymous mutations were observed in the region adjacent to the angiotensin converting enzyme 2 (ACE2) binding site.

Conclusion: It has been reported that the rapid infectivity and transmission of SARS-CoV-2 associated with host receptor affinity are derived from several mutations in its genes. Without these genetic mutations to enhance evolutionary adaptation, species recognition, host receptor affinity, and pathogenicity, it would not survive. It is expected that our results could provide an important clue in understanding the genomic characteristics of SARS-CoV-2.

Keywords: SARS-CoV-2; evolutions; mutation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest The authors have no conflicts of interest to declare.

Figures

**Figure 1**
Phylogenetic tree of SARS-CoV-2. This tree was performed by using the MEGA with parameters such as neighbor joining method, bootstrap 1,000 replications for the phylogeny test, Kimura 2-parameter for substitution model, and pairwise deletion for gap/missing data treatment. S and V clades determined by L84G (ORF8) and G251V (ORF3a), respectively. The reference strain (hCoV-19/Wuhan-Hu-1/2019) is indicated by bold letters on a light gray background. The notation of amino acid substitutions used here means replacements from amino acid of the reference strain on left to a difference amino acid of the corresponding strain on right. MEGA = molecular evolutionary genetics analysis.

**Figure 2**
Phylogenetic tree of G clade and its subclades. G Clade determined by D614G (S) was classified into G.1 clade, G.2 clade, and other strains. G.1 and G.2 clades share Q57H (ORF3a) and G204R (N), R203K (N), together with P214L (ORF1b), respectively. The reference strain (hCoV-19/Wuhan-Hu-1/2019) is indicated by bold letters on a light gray background. The notation of amino acid substitutions used here means replacements from amino acid of the reference strain on left to a difference amino acid of the corresponding strain on right.

**Figure 3**
Types of mutation distribution in 12 coding sequences. This figure summarizes the distribution of point mutations on 12 coding sequences (ORF1a, ORF1b, S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF10). Bars filled with red color indicates non-synonymous mutations and bars filled with blue color indicates synonymous mutation. Number above and below bars show the type of mutation.

**Figure 4**
Frequency of mutations in coding sequences. (A) indicates frequency of 5,940 synonymous mutations (767 types of synonymous mutation) in 47,176 coding sequences from 4,254 strains. (B) indicates frequency of 13,537 non-synonymous mutations (1,352 types of non-synonymous mutation) in them.

**Figure 5**
Distribution of mutations in SARS-CoV-2 whole genome. The length of SARS-CoV-2 genome is 29,903 bases. SARS-CoV-2 genome is composed of 5′ UTR, ORF1a, ORF1b, S, E, M, N, accessary proteins (ORF3a, ORF6, ORF7, ORF8, and ORF10), and 3′ UTR. The ORF1a and ORF1b encodes from nsp1 to nsp16 proteins.

**Figure 6**
Distribution of mutations and primer-template mismatches in ORF1a and ORF1b. The genomic positions (13,442–16,236) of RNA-dependent RNA polymerase (RdRp) was based on NCBI ID NC_045512.2. The primer- and probe-template regions were derived from the lists published on the Centers for Disease Control and Prevention (CDC).

**Figure 7**
Distribution of mutations in receptor-binding domain and polybasic cleavage site within S gene. The genomic positions (22,478–23,191) of receptor-binding domain (RBD) and the amino acid position (682–685) of polybasic cleavage site (PBCS) were based on the reference [4], and the reference [14], respectivity.

**Figure 8**
Distribution of mutations and primer-template mismatches in ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF10. The genomic positions (13,442–16,236) of RNA-dependent RNA polymerase (RdRp) was based on NCBI ID NC_045512.2. The primer- and probe-template regions were retrieved from the lists published on the Centers for Disease Control and Prevention (CDC).

See this image and copyright information in PMC

References

1. Wrapp D, Wang N, Corbett KS, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367(6483):1260–3. doi: 10.1126/science.abb2507. - DOI - PMC - PubMed
1. Rehman SU, Shafique L, Ihsan A, et al. Evolutionary trajectory for the emergence of novel coronavirus SARS-CoV-2. Pathogens. 2020;9(3):E240. doi: 10.3390/pathogens9030240. - DOI - PMC - PubMed
1. The Johns Hopkins Center for Health Security. nCoV Genetics [Internet] [cited 2020 Feb 3]. Available from: http://www.centerforhealthsecurity.org/resources/COVID-19/COVID-19-fact-....
1. Chan JF, Kok KH, Zhu Z, et al. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg Microbes Infect. 2020;9(1):221–236. doi: 10.1080/22221751.2020.1719902. - DOI - PMC - PubMed
1. Ou X, Guan H, Qin B, et al. Crystal structure of the receptor-binding domain of the spike glycoprotein of human betacoronavirus HKU1. Nat Commun. 2017;8:15216. doi: 10.1038/ncomms15216. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome

Affiliations

Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous