Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct:84:104389.
doi: 10.1016/j.meegid.2020.104389. Epub 2020 Jun 2.

Exploring the genomic and proteomic variations of SARS-CoV-2 spike glycoprotein: A computational biology approach

Affiliations

Exploring the genomic and proteomic variations of SARS-CoV-2 spike glycoprotein: A computational biology approach

Syed Mohammad Lokman et al. Infect Genet Evol. 2020 Oct.

Abstract

The newly identified SARS-CoV-2 has now been reported from around 185 countries with more than a million confirmed human cases including more than 120,000 deaths. The genomes of SARS-COV-2 strains isolated from different parts of the world are now available and the unique features of constituent genes and proteins need to be explored to understand the biology of the virus. Spike glycoprotein is one of the major targets to be explored because of its role during the entry of coronaviruses into host cells. We analyzed 320 whole-genome sequences and 320 spike protein sequences of SARS-CoV-2 using multiple sequence alignment. In this study, 483 unique variations have been identified among the genomes of SARS-CoV-2 including 25 nonsynonymous mutations and one deletion in the spike (S) protein. Among the 26 variations detected in S, 12 variations were located at the N-terminal domain (NTD) and 6 variations at the receptor-binding domain (RBD) which might alter the interaction of S protein with the host receptor angiotensin-converting enzyme 2 (ACE2). Besides, 22 amino acid insertions were identified in the spike protein of SARS-CoV-2 in comparison with that of SARS-CoV. Phylogenetic analyses of spike protein revealed that Bat coronavirus have a close evolutionary relationship with circulating SARS-CoV-2. The genetic variation analysis data presented in this study can help a better understanding of SARS-CoV-2 pathogenesis. Based on results reported herein, potential inhibitors against S protein can be designed by considering these variations and their impact on protein structure.

Keywords: COVID-19; Genomic variants; SARS-CoV-2; Sequence analysis; Spike protein.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors would like to declare that there is no known contending financial interests or personal relationships that could affect the work reported in this paper.

Figures

Unlabelled Image
Graphical abstract
Fig. 1
Fig. 1
Nucleotide sequence variation among 320 SARS-CoV-2 whole genomes. A. Positional organization of major structural protein-encoding genes in orange color (S = Spike protein, E = Envelope protein, M = Membrane protein, N = Nucleocapsid protein) and accessory protein ORFS in blue colors. B. Variability within 320 SARS-CoV-2 genomic sequences represented by entropy (H(x)) value across genomic location. Two highest frequency of alterations were found at position 8785 of ORF1a and 28,144 of ORF8. C. The respective alignment view of each highly variable regions. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Sequence phylogeny of SARS-CoV-2 spike glycoprotein variants and other coronavirus spike proteins based on amino acid sequences, retrieved from NCBI database using neighbor-joining methods in ClustalW and tree structure was validated by running the analysis on 1000 bootstraps. The branch length is indicated in the scale bar. The accession number YP_009724390 represents identical sequences out of SARS-CoV-2 spike proteins.
Fig. 3
Fig. 3
Overall architecture of the SARS-CoV-2 S glycoprotein. A.Schematic diagram of the SARS-CoV S glycoprotein showing domain organization (Reconstructed from Y. Yuan et al., 2017 and M. Gui et al., 2017). B. Schematic domain organization diagram of the SARS-CoV-2 S glycoprotein constructed by aligning with SARS-CoV S protein domain. C. Homology model of SARS-CoV-2 S protein reference sequence YP_009724390 with PDB:6VSB. S protein trimer with two protomers surface shadowed (left). Ribbon diagram of SARS-CoV-2 S glycoprotein monomer from B. Here, NTD: N-terminal domain; RBD: receptor-binding domain; SD: subdomain; CR: connecting region; HR: heptad repeat; CH: central helix; BH: β-hairpin; FP: fusion peptide; TM: transmembrane domain; CT: cytoplasmic tail.
Fig. 4
Fig. 4
Variability within 320 SARS-CoV-2 S protein sequences. A. Schematic representation of mutations across the spike protein domain organization. Blue, red, and black color represents charge of the amino acid residue as positive, negative, and neutral respectively. B—N, Superposed structures of SARS-CoV-2 spike protein variants with the Cryo-EM structure of SARS-CoV-2 Spike Protein (PDB: 6VSB). Template residues are indicated by green color and variants' residues are indicated as red color. Here, B: Y28N, C: T29I, D: H49Y, E: L54F, F: D111N, G: S221W, H: A348T, I: R408I, J: H519Q, K: A520S, L: D614G, M: F797C, N: A930V, O: D936Y, and P: A1078V. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 5
Fig. 5
The spatial distribution of variations found in S glycoprotein over time. The surface plot illustrates the frequency distribution of each variation over time. The geographic location of the sample is presented with flags and if the frequency of each variation (if more than one from a single country) is shown below the respective flag.

References

    1. Andrew W., Martino B., Stefan B., Gabriel S., Gerardo T., Rafal G., Heer Florian T., de Beer Tjaart A.P., Christine Rempfer, Lorenza Bordoli, Rosalba Lepore, Torsten Schwede. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–W303. - PMC - PubMed
    1. Bhattacharya M., Sharma A.R., Patra P., Ghosh P., Sharma G., Patra B.C., Lee S., Chakraborty C. Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach. J. Med. Virol. 2020;92:618–631. doi: 10.1002/jmv.25736. - DOI - PMC - PubMed
    1. Bosch B.J., van der Zee R., de Haan C.A.M., Rottier P.J.M. The coronavirus spike protein is a class I virus fusion protein: structural and functional characterization of the fusion core complex. J. Virol. 2003;77:8801–8811. - PMC - PubMed
    1. Bosch B.J., Rossen J.W.A., Bartelink W., Zuurveen S.J., de Haan C.A.M., Duquerroy S., Boucher C.A.B., Rottier P.J.M. Coronavirus escape from heptad repeat 2 (HR2)-derived peptide entry inhibition as a result of mutations in the HR1 domain of the spike fusion protein. J. Virol. 2008;82:2580–2585. - PMC - PubMed
    1. Cárdenas-Conejo Y., Liñan-Rico A., Garcia-Rodriguez D.A., Centeno-Leija S., Serrano-Posada H. An exclusive 42 amino acid signature in pp1ab protein provides insights into the evolutive history of the 2019 novel human-pathogenic coronavirus (SARS-CoV2) J. Med. Virol. 2020;92:688–692. doi: 10.1002/jmv.25758. - DOI - PMC - PubMed

MeSH terms

Substances