Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar;7(3):e06564.
doi: 10.1016/j.heliyon.2021.e06564. Epub 2021 Mar 19.

Pan-India novel coronavirus SARS-CoV-2 genomics and global diversity analysis in spike protein

Affiliations

Pan-India novel coronavirus SARS-CoV-2 genomics and global diversity analysis in spike protein

Shweta Alai et al. Heliyon. 2021 Mar.

Abstract

The mortality rates due to COVID-19 have been found disproportionate globally and are currently being researched. India mortality rate with a population of 1.3 billion people is relatively lowest to other countries with high infection rates. Genetic composition of circulating isolates continues to be a key determinant of virulence and pathogenesis. This study aimed to analyse the extent of divergence between genomes of Indian isolates (n = 2525 as compared to reference Wuhan-1 strain and isolates from countries showing higher fatality rates including France, Italy, Belgium, and the USA. The study also analyses the impact of key mutations on interactions with angiotensin converting enzyme 2 (ACE2) and panel of neutralizing monoclonal antibodies. Using 1,44,605 spike protein sequences, global prevalence of mutations in spike protein was observed. The study suggests that SARS-CoV-2 genomes from India share consensus with global trends with respect to D614G as most prevalent mutational event (81.66% among 2525 Indian isolates). Indian isolates did not reported prevalence of N439K mutation in receptor binding motif (RBM) as compared to global isolates (0.54%). Computational docking and molecular dynamics simulation analysis of N439K mutation with respect to ACE 2 binding and reactivity with RBM targeted antibodies viz., B38, BD23, CB6, P2B-F26 and EY6A suggests that variant have relatively higher affinity with ACE 2 receptor which may support higher infectivity. The study warrants large scale monitoring of Indian isolates as SARS-CoV-2 virus is expected to evolve and mutations may appear in unpredictable way.

Keywords: COVID-19; Clades; Comparative genomics; Fatality rate; Neutralizing antibodies; Pandemic; Receptor binding domain; SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Lollipop plots showing mutations distribution and frequency in Indian SARS-CoV-2 genome sequences. The frequency of mutations is shown on the X-axis and the presence of a mutation is shown on the Y axis (lollipop), correlates with the heights of the vertical lines representing each lollipop.
Figure 2
Figure 2
The linear diagrams represent mutations observed in genome and its genes distribution in the SARS-CoV-2 Indian genome sequences. Diagrams in red and violet represent the protein subunits of ORF1ab and S, respectively. The presence of a mutation is represented in front of gene under each line, the most frequent variants in RBD domain are annotated as star mark the amino acid change at that specific site, and most frequent of Spike protein and among all mutations is presented by circles.
Figure 3
Figure 3
Statewise mutation prevalance of SARS-CoV-2 genomes in India. A) Graph showing combine statewise mutation marker prevalence of SARS-CoV-2 sequences collected from states within India. B) Pie chart showing mutation marker prevalence in each individual states of SARS-CoV-2 sequences collected from states within India.
Figure 4
Figure 4
Regional Clade distribution of SARS-CoV-2 genomes. A) Graph showing country wise clade distribution of sequences collected from countries Belgium, Italy, France, India, and the USA. The charts show the clades grouped on X axis and countries represented in different colours with relative frequencies of clades for each country on Y axis. The countries are colour coded in squares and shown below chart in the diagram. B) Graph showing statewise clade distribution of sequences collected from India. The charts show the relative frequencies of clades for each state on X axis and states with different clades represented on Y axis.
Figure 5
Figure 5
Clade distribution timeline in India. The graph represents distribution of clades in India from January till July 2020.Clades was represented in square box with respective colours below the x axis.
Figure 6
Figure 6
The phylogenetic tree based on the whole genome alignment of 863 genomes sequenced from India. Tree showed the presence of 7 different clades and demonstrates that SARS-CoV-2 is wildly disseminated across 237 distinct geographical location assessed till 19/06/2020. Tree was rooted with hCoV-19/Wuhan/WIV04/2019/EPI_ISL_402124 as a reference. Clades were represented as coloured ranges shown in left panel of the diagram. First outer circle represents lineages corresponding which are represented as strip label. Outer circle represents coloured labels representing different geographical regions from India; Maharashtra-Orange, Gujarat- Blue, Delhi- Green, Karnataka- Yellow, Telangana- Red, West Bengal- Purple, Uttar Pradesh- Green, Haryana-Grey.
Figure 7
Figure 7
Prevalence of specific amino acid mutations among spike protein. Bubble map representing amino acid mutation and its prevalence observed among 1,44,605 genomes. Single amino acid position represents reference hCoV-19/Wuhan/WIV04/EPI_ISL_402124. Circle size shows sum or frequency of events. The marks are labelled by mutations. The mutation of Aspartic acid to Glycine at position 614 was observed 1,23,415 times among the available 1,44,605 SARS-CoV-2 spike protein sequences used in this study. Overall, the S2 domain and specially heptads repeats were rich in mutational events observed and RBM region was found least variable (0.25%) among 1273 bp long protein.
Figure 8
Figure 8
Schematic representation of RBD mutant location on RBD complexed with ACE-2 Receptor A) The RMSDs of the backbone atoms of both RBD-ACE-2 complex; B) The RMSFs of Cα atoms of both RBD-ACE2 complexes C) Binding free energies of SARS-CoV-2 RBD ACE-2 (including wild and mutant at N439K) D) Ribbon diagram structure of hydrogen bonds between SARS-CoV-2 and h-ACE2 receptor wild type E) Ribbon diagram structure of hydrogen bonds between SARS-CoV-2 and h-ACE2 receptor mutant N439K type.

Similar articles

Cited by

References

    1. Wu Y., Ho W., Huang Y., Jin D.Y., Li S., Liu S.L., Liu X., Qiu J., Sang Y., Wang Q., Yuen K.Y. SARS-CoV-2 is an appropriate name for the new coronavirus. Lancet. 2020 Mar 21;395(10228):949–950. - PMC - PubMed
    1. World Health Organization . World Health Organization; Geneva: 2020. WHO Coronavirus Disease (COVID-19) Dashboard [Internet]
    1. Lindahl J.F., Grace D. The consequences of human actions on risks for infectious diseases: a review. Infect. Ecol. Epidemiol. 2015 Jan 1;5(1):30048. - PMC - PubMed
    1. https://www.mohfw.gov.in
    1. Lu R., Zhao X., Li J., Niu P., Yang B., Wu H., Wang W., Song H., Huang B., Zhu N., Bi Y. Genomic characterization, and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020 Feb 22;395(10224):565–574. - PMC - PubMed

LinkOut - more resources