Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 4;13(2):243.
doi: 10.3390/v13020243.

Evolution of SARS-CoV-2 Envelope, Membrane, Nucleocapsid, and Spike Structural Proteins from the Beginning of the Pandemic to September 2020: A Global and Regional Approach by Epidemiological Week

Affiliations

Evolution of SARS-CoV-2 Envelope, Membrane, Nucleocapsid, and Spike Structural Proteins from the Beginning of the Pandemic to September 2020: A Global and Regional Approach by Epidemiological Week

Paloma Troyano-Hernáez et al. Viruses. .

Abstract

Monitoring acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genetic diversity and emerging mutations in this ongoing pandemic is crucial for understanding its evolution and assuring the performance of diagnostic tests, vaccines, and therapies against coronavirus disease (COVID-19). This study reports on the amino acid (aa) conservation degree and the global and regional temporal evolution by epidemiological week for each residue of the following four structural SARS-CoV-2 proteins: spike, envelope, membrane, and nucleocapsid. All, 105,276 worldwide SARS-CoV-2 complete and partial sequences from 117 countries available in the Global Initiative on Sharing All Influenza Data (GISAID) from 29 December 2019 to 12 September 2020 were downloaded and processed using an in-house bioinformatics tool. Despite the extremely high conservation of SARS-CoV-2 structural proteins (>99%), all presented aa changes, i.e., 142 aa changes in 65 of the 75 envelope aa, 291 aa changes in 165 of the 222 membrane aa, 890 aa changes in 359 of the 419 nucleocapsid aa, and 2671 changes in 1132 of the 1273 spike aa. Mutations evolution differed across geographic regions and epidemiological weeks (epiweeks). The most prevalent aa changes were D614G (81.5%) in the spike protein, followed by the R203K and G204R combination (37%) in the nucleocapsid protein. The presented data provide insight into the genetic variability of SARS-CoV-2 structural proteins during the pandemic and highlights local and worldwide emerging aa changes of interest for further SARS-CoV-2 structural and functional analysis.

Keywords: D614G; G204R; R203K; SARS-CoV-2; envelope; genetic variability; membrane; nucleocapsid; spike; structural proteins.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Percentage of global sequences with amino acids changes across the four SARS-CoV-2 structural proteins and their location in the protein domains. (A) Spike protein (1273 aa, 101,100 total sequences). With a brown star, D614G, change from aspartic acid to glycine in residue 614 of spike protein, present in 81.5% of the global sequences, and located in S1 domain, before S1/S2 furin cleavage site. With an orange star, S477N, change from serine to asparagine in residue 477, present in 4.1% of the global sequences, and located in the receptor binding motif. In light blue, receptor binding domain (RBD). In dark blue, within the RBD, receptor binding motif (RBM); (B) Envelope protein (75 aa, 101,376 total sequences). With an orange star, S68F, change from serine to phenylalanine at position 68 of envelope protein, present in 0.2% of the global sequences. In light blue, PDZ-binding motif (PBM); (C) Membrane protein (222 aa, 103,419 total sequences). With an orange star, D3G, change from aspartic acid to glycine in residue 3, and T175 M, change from threonine to methionine in residue 75. D3G and T175M were present in 0.7% and 1% global sequences, respectively. In light blue, transmembrane domains (TM); (D) Nucleocapsid protein (419 aa, 99,657 total sequences). With a brown star, R203K, change from arginine to lysine in position 203 and G204R change from glycine to arginine in position 204. R203K and G204R were present in 37.3% and 37% of the global sequences, respectively, and located in the serine/arginine-rich (SR)-linker. Color code as follows: white, 0% of sequences with aa changes; yellow, >0 to 0.1% of sequences with aa changes; light orange, >0.1 to 1% of sequences with aa changes; dark orange, >1 to 30% of sequences with aa changes; brown, >30% of sequences with aa changes. In light blue, serine/arginine-rich linker (SR-linker). SS, signal peptide; NTD, N-terminal domain; CTD, C-terminal domain; RBD, receptor binding domain; RBM, receptor binding motif; FP, fusion peptide; HR, heptad repeat; TM, transmembrane domain; PBM, PDZ-binding motif; SR-linker, serine/arginine-rich linker. Annotation according to UniProtKB (https://www.uniprot.org) and RCSB Protein Data Bank (https://www.rcsb.org).
Figure 2
Figure 2
Global and regional frequency of D614G change in the spike protein over time. (A) Global and regional D614G frequency distribution in epidemiological weeks with, at least, 10 available sequences. The x-axis represents epidemiological week and the y-axis represents percentage of mutated sequences; (B) Global and regional number of sequences in spike’s residue 614 harboring aspartic acid and glycine amino acids in epidemiological weeks with at least 10 available sequences. The x-axis represents epidemiological week and the y-axis represents the number of sequences harboring other aa than G, in grey color, and G, in blue color.
Figure 3
Figure 3
Global and regional frequency of the R203K and G204R combination in the nucleocapsid SARS-CoV-2 protein over time. The figure only includes data of those epiweeks with at least 10 nucleocapsid sequences. The x-axis represents epidemiological week and the y-axis represents percentage of mutated sequences. Color code, red (R203K) and green (G204R). R, arginine; K, lysine; G, glycine.
Figure 4
Figure 4
Exponential linear regression for aa combinations in positions 203 and 204 of the nucleocapsid protein. (A) Exponential linear regression for RG combination in positions 203 + 204 of the mucleocapsid protein. The exponential curve shows an overall decrease of the RG combination over time. b = −0.02, Y = 109.7 × (e^(−0.0264 × epiweek)) or ln(Y) = ln(109.7) + (−0.0264 × epiweek), R2 = 88.7%; (B) Exponential linear regression for KR combination in positions 203 + 204 of the nucleocapsid protein. The exponential curve shows an overall increase of the KR combination over time. b = 0.03, Y = 19 × (e^(0.0343 × epiweek)) or ln(Y) = ln(19) + (0.0343 × epiweek), R2 = 73.2%. The x-axis represents epidemiological weeks and the y-axis represents frequency percentage of observed combinations. Green dot, observed aa combinations in positions 203 and 204 of the nucleocapsid protein; red line, exponential curve; B, slope; R2, relative predictive power.

References

    1. Wang C., Horby P.W., Hayden F.G., Gao G.F. A novel coronavirus outbreak of global health concern. Lancet. 2020;395:470–473. doi: 10.1016/S0140-6736(20)30185-9. - DOI - PMC - PubMed
    1. Wang C., Liu Z., Chen Z., Huang X., Xu M., He T., Zhang Z. The establishment of reference sequence for SARS-CoV-2 and variation analysis. J. Med. Virol. 2020;92:667–674. doi: 10.1002/jmv.25762. - DOI - PMC - PubMed
    1. Michel C.J., Mayer C., Poch O., Thompson J.D. Characterization of accessory genes in coronavirus genomes. Virol. J. 2020;17:131. doi: 10.1186/s12985-020-01402-1. - DOI - PMC - PubMed
    1. Ahmadpour D., Ahmadpoor P. How the COVID-19 Overcomes the Battle? An Approach to Virus Structure. Iran. J. Kidney Dis. 2020;14:167–172. - PubMed
    1. Fehr A.R., Perlman S. Coronaviruses: An overview of their replication and pathogenesis. Methods Mol. Biol. 2015;1282:1–23. doi: 10.1007/978-1-4939-2438-7_1. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources