Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun;11(3):101-111.
doi: 10.24171/j.phrp.2020.11.3.05.

Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome

Affiliations

Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome

Jun-Sub Kim et al. Osong Public Health Res Perspect. 2020 Jun.

Abstract

Objectives: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China, in December 2019 and has been rapidly spreading worldwide. Although the causal relationship among mutations and the features of SARS-CoV-2 such as rapid transmission, pathogenicity, and tropism, remains unclear, our results of genomic mutations in SARS-CoV-2 may help to interpret the interaction between genomic characterization in SARS-CoV-2 and infectivity with the host.

Methods: A total of 4,254 genomic sequences of SARS-CoV-2 were collected from the Global Initiative on Sharing all Influenza Data (GISAID). Multiple sequence alignment for phylogenetic analysis and comparative genomic approach for mutation analysis were conducted using Molecular Evolutionary Genetics Analysis (MEGA), and an in-house program based on Perl language, respectively.

Results: Phylogenetic analysis of SARS-CoV-2 strains indicated that there were 3 major clades including S, V, and G, and 2 subclades (G.1 and G.2). There were 767 types of synonymous and 1,352 types of non-synonymous mutation. ORF1a, ORF1b, S, and N genes were detected at high frequency, whereas ORF7b and E genes exhibited low frequency. In the receptor-binding domain (RBD) of the S gene, 11 non-synonymous mutations were observed in the region adjacent to the angiotensin converting enzyme 2 (ACE2) binding site.

Conclusion: It has been reported that the rapid infectivity and transmission of SARS-CoV-2 associated with host receptor affinity are derived from several mutations in its genes. Without these genetic mutations to enhance evolutionary adaptation, species recognition, host receptor affinity, and pathogenicity, it would not survive. It is expected that our results could provide an important clue in understanding the genomic characteristics of SARS-CoV-2.

Keywords: SARS-CoV-2; evolutions; mutation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Phylogenetic tree of SARS-CoV-2. This tree was performed by using the MEGA with parameters such as neighbor joining method, bootstrap 1,000 replications for the phylogeny test, Kimura 2-parameter for substitution model, and pairwise deletion for gap/missing data treatment. S and V clades determined by L84G (ORF8) and G251V (ORF3a), respectively. The reference strain (hCoV-19/Wuhan-Hu-1/2019) is indicated by bold letters on a light gray background. The notation of amino acid substitutions used here means replacements from amino acid of the reference strain on left to a difference amino acid of the corresponding strain on right. MEGA = molecular evolutionary genetics analysis.
Figure 2
Figure 2
Phylogenetic tree of G clade and its subclades. G Clade determined by D614G (S) was classified into G.1 clade, G.2 clade, and other strains. G.1 and G.2 clades share Q57H (ORF3a) and G204R (N), R203K (N), together with P214L (ORF1b), respectively. The reference strain (hCoV-19/Wuhan-Hu-1/2019) is indicated by bold letters on a light gray background. The notation of amino acid substitutions used here means replacements from amino acid of the reference strain on left to a difference amino acid of the corresponding strain on right.
Figure 3
Figure 3
Types of mutation distribution in 12 coding sequences. This figure summarizes the distribution of point mutations on 12 coding sequences (ORF1a, ORF1b, S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF10). Bars filled with red color indicates non-synonymous mutations and bars filled with blue color indicates synonymous mutation. Number above and below bars show the type of mutation.
Figure 4
Figure 4
Frequency of mutations in coding sequences. (A) indicates frequency of 5,940 synonymous mutations (767 types of synonymous mutation) in 47,176 coding sequences from 4,254 strains. (B) indicates frequency of 13,537 non-synonymous mutations (1,352 types of non-synonymous mutation) in them.
Figure 5
Figure 5
Distribution of mutations in SARS-CoV-2 whole genome. The length of SARS-CoV-2 genome is 29,903 bases. SARS-CoV-2 genome is composed of 5′ UTR, ORF1a, ORF1b, S, E, M, N, accessary proteins (ORF3a, ORF6, ORF7, ORF8, and ORF10), and 3′ UTR. The ORF1a and ORF1b encodes from nsp1 to nsp16 proteins.
Figure 6
Figure 6
Distribution of mutations and primer-template mismatches in ORF1a and ORF1b. The genomic positions (13,442–16,236) of RNA-dependent RNA polymerase (RdRp) was based on NCBI ID NC_045512.2. The primer- and probe-template regions were derived from the lists published on the Centers for Disease Control and Prevention (CDC).
Figure 7
Figure 7
Distribution of mutations in receptor-binding domain and polybasic cleavage site within S gene. The genomic positions (22,478–23,191) of receptor-binding domain (RBD) and the amino acid position (682–685) of polybasic cleavage site (PBCS) were based on the reference [4], and the reference [14], respectivity.
Figure 8
Figure 8
Distribution of mutations and primer-template mismatches in ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF10. The genomic positions (13,442–16,236) of RNA-dependent RNA polymerase (RdRp) was based on NCBI ID NC_045512.2. The primer- and probe-template regions were retrieved from the lists published on the Centers for Disease Control and Prevention (CDC).

References

    1. Wrapp D, Wang N, Corbett KS, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367(6483):1260–3. doi: 10.1126/science.abb2507. - DOI - PMC - PubMed
    1. Rehman SU, Shafique L, Ihsan A, et al. Evolutionary trajectory for the emergence of novel coronavirus SARS-CoV-2. Pathogens. 2020;9(3):E240. doi: 10.3390/pathogens9030240. - DOI - PMC - PubMed
    1. The Johns Hopkins Center for Health Security. nCoV Genetics [Internet] [cited 2020 Feb 3]. Available from: http://www.centerforhealthsecurity.org/resources/COVID-19/COVID-19-fact-....
    1. Chan JF, Kok KH, Zhu Z, et al. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg Microbes Infect. 2020;9(1):221–236. doi: 10.1080/22221751.2020.1719902. - DOI - PMC - PubMed
    1. Ou X, Guan H, Qin B, et al. Crystal structure of the receptor-binding domain of the spike glycoprotein of human betacoronavirus HKU1. Nat Commun. 2017;8:15216. doi: 10.1038/ncomms15216. - DOI - PMC - PubMed