Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 23;21(1):304.
doi: 10.1186/s13059-020-02191-0.

Potentially adaptive SARS-CoV-2 mutations discovered with novel spatiotemporal and explainable AI models

Affiliations

Potentially adaptive SARS-CoV-2 mutations discovered with novel spatiotemporal and explainable AI models

Michael R Garvin et al. Genome Biol. .

Abstract

Background: A mechanistic understanding of the spread of SARS-CoV-2 and diligent tracking of ongoing mutagenesis are of key importance to plan robust strategies for confining its transmission. Large numbers of available sequences and their dates of transmission provide an unprecedented opportunity to analyze evolutionary adaptation in novel ways. Addition of high-resolution structural information can reveal the functional basis of these processes at the molecular level. Integrated systems biology-directed analyses of these data layers afford valuable insights to build a global understanding of the COVID-19 pandemic.

Results: Here we identify globally distributed haplotypes from 15,789 SARS-CoV-2 genomes and model their success based on their duration, dispersal, and frequency in the host population. Our models identify mutations that are likely compensatory adaptive changes that allowed for rapid expansion of the virus. Functional predictions from structural analyses indicate that, contrary to previous reports, the Asp614Gly mutation in the spike glycoprotein (S) likely reduced transmission and the subsequent Pro323Leu mutation in the RNA-dependent RNA polymerase led to the precipitous spread of the virus. Our model also suggests that two mutations in the nsp13 helicase allowed for the adaptation of the virus to the Pacific Northwest of the USA. Finally, our explainable artificial intelligence algorithm identified a mutational hotspot in the sequence of S that also displays a signature of positive selection and may have implications for tissue or cell-specific expression of the virus.

Conclusions: These results provide valuable insights for the development of drugs and surveillance strategies to combat the current and future pandemics.

Keywords: Adaptive mutation; COVID-19; Coronavirus; Local adaptation; Molecular evolution; SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
a) Ratio of non-synonymous to synonymous mutations (dN/dS) per gene (barplots). We used full-length sequences harboring 395 variable coding sites from GISAID to estimate ratios from the 385 haplotypes detected (see “Methods”). Genes with less than ten mutations across the population and haplotypes with fewer than five individuals were excluded. Ten genes (E, nsp7, nsp8, nsp10, nsp16, ORF6, ORF7A, ORF7B, and ORF8) are likely under high purifying selection at the nucleotide level given that both synonymous and non-synonymous mutations are rare. All changes in a gene were used to calculate dN/dS. Barplots are centered over the strongest signal in a gene. b) Wavelet analysis of non-synonymous (top) and synonymous (bottom) mutations across the SARS-CoV-2 genome. Arrows indicate mutation sites discussed in the text. The y-axis corresponds to the density of the wavelet across the genome as a log-scale. Higher values indicate a broader wavelet and thus coarser granularity
Fig. 2
Fig. 2
Genealogy and success model of SARS-CoV-2 haplotypes. a) Median-joining network of 13,979 full length sequences (haplotypes < 0.05% were removed). Nodes are haplotypes and edges are mutational events. Node size is proportional to the number of individuals. Red gradient in the center of a node indicates the date of emergence (light red haplotype of the Wuhan reference sequence is indicated). Node perimeter darkness reflects the success of a haplotype based on number of days, number of regions, and number of individuals from which it was sampled. Dark perimeter, small diameter nodes indicate haplotypes that persisted globally for long periods but did not expand into many individuals (unsuccessful). Diamonds denote individuals with an amino acid change in the serine/arginine rich region of the N protein (see text). Pie charts indicate geographic distribution of the major nodes. Measures of mutability are given for the three major clades as mutations per day and mutations per individual and dN/dS is provided for each major clade (see text). Exclamation point signifies back mutation to reference sequence. b) Alignment of the hyper-mutable region at the signal peptide sequence of S is shown in the upper right. The conserved string of phenylalanine, leucine, and valine residues results in the T-rich region of the signal peptide at the nucleotide level and three runs of the repeat sequence “GTTTT”, which could be responsible for the hyper-mutation. Haplotypes that are linked to individuals with the hypermutable site are shown with a pink asterisk in A (nodes for the haplotypes with hyper-mutation not shown due to low frequency, see “Methods”)
Fig. 3
Fig. 3
SARS-CoV-2 single-nucleotide mutation spectrum. For each of the twelve classes of mutation, the number of each mutation class at single-mutation variant positions is plotted. The Wu-Hu-1 reference genome (Accession NC_045512.2) was used to define the pre-mutation nucleotide for each class
Fig. 4
Fig. 4
Mutations in the SARS-CoV-2 replication complex (nsp7, nsp8 and nsp12) and spike glycoprotein (S). a Active form of RNA-dependent RNA polymerase (nsp12) associated with the cofactors nsp7 and nsp8 (cryo-EM structure, PDB id 6yyt) [26]. b View of the proximity surrounding the loop where site 323 of nsp12 is located to nsp8 (red box). Pro323Leu is a frequent mutation in nsp12. View of the proximity between Ser25 in nsp7 and Asp163 in nsp8 (green box), which likely interact with each other via hydrogen bonds. The mutation Ser25Leu in nsp7 is fairly frequent. c Cryo-EM structure of the S trimer in the closed conformation showing the location of sites 614 (at end of S1 subunit, red square) and 483 (at the β-4,5 loop of the receptor-binding domain, orange square). The cryo-EM structure (PDB id 6vxx) was used in this image. Glycans are not depicted for clarity. The missing loops were modeled using the Rosetta framework [32, 72]. d Magnified view of the salt bridge network around Asp614 (red box), which may facilitate electrostatic-driven interactions within monomers. The mutation Asp614Gly is quite frequent (62% of sequences analyzed have it) (PDB id 6xr8) [73]. Magnified view of the interface between the spike RBD (pink) and ACE2 (orange) (orange box, PDB id 6 m17 [74]). Site 483 is located at the β-4,5 loop of RBD. The mutations Val483Gly, Val483Ala, and Val483Asp were identified in SARS-CoV-2

References

    1. Khailany RA, Safdar M, Ozaslan M. Genomic characterization of a novel SARS-CoV-2. Gene Rep. 2020;19:100682. - PMC - PubMed
    1. Tang X, Wu C, Li X, Song Y, Yao X, Wu X, et al. On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev. 2020; 10.1093/nsr/nwaa036. - PMC - PubMed
    1. Wang C, Liu Z, Chen Z, Huang X, Xu M, He T, et al. The establishment of reference sequence for SARS-CoV-2 and variation analysis. Jo Med Virol. 2020:667–74. 10.1002/jmv.25762. - PMC - PubMed
    1. Pachetti M, Marini B, Benedetti F, Giudici F, Mauro E, Storici P, et al. Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J Transl Med. 2020;18:179. doi: 10.1186/s12967-020-02344-6. - DOI - PMC - PubMed
    1. Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, et al. Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell. 2020;182:812–27.e19. doi: 10.1016/j.cell.2020.06.043. - DOI - PMC - PubMed

Publication types