Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2021 Apr 20:2020.09.11.293258.
doi: 10.1101/2020.09.11.293258.

Functional binding dynamics relevant to the evolution of zoonotic spillovers in endemic and emergent Betacoronavirus strains

Affiliations

Functional binding dynamics relevant to the evolution of zoonotic spillovers in endemic and emergent Betacoronavirus strains

Patrick Rynkiewicz et al. bioRxiv. .

Update in

Abstract

Comparative functional analysis of the dynamic interactions between various Betacoronavirus mutant strains and broadly utilized target proteins such as ACE2 and CD26, is crucial for a more complete understanding of zoonotic spillovers of viruses that cause diseases such as COVID-19. Here, we employ machine learning to replicated sets of nanosecond scale GPU accelerated molecular dynamics simulations to statistically compare and classify atom motions of these target proteins in both the presence and absence of different endemic and emergent strains of the viral receptor binding domain (RBD) of the S spike glycoprotein. Machine learning was used to identify functional binding dynamics that are evolutionarily conserved from bat CoV-HKU4 to human endemic/emergent strains. Conserved dynamics regions of ACE2 involve both the N-terminal helices, as well as a region of more transient dynamics encompassing K353, Q325 and a novel motif AAQPFLL 386-92 that appears to coordinate their dynamic interactions with the viral RBD at N501. We also demonstrate that the functional evolution of Betacoronavirus zoonotic spillovers involving ACE2 interaction dynamics are likely pre-adapted from two precise and stable binding sites involving the viral bat progenitor strain's interaction with CD26 at SAMLI 291-5 and SS 333-334. Our analyses further indicate that the human endemic strains hCoV-HKU1 and hCoV-OC43 have evolved more stable N-terminal helix interactions through enhancement of an interfacing loop region on the viral RBD, whereas the highly transmissible SARS-CoV-2 variants (B.1.1.7, B.1.351 and P.1) have evolved more stable viral binding via more focused interactions between the viral N501 and ACE2 K353 alone.

Keywords: COVID 19; Molecular dynamics; molecular evolution; viral binding.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. An overview of the molecular system and workflow to generate the comparative protein dynamics data sets.
(A) The open conformation SARS-CoV-2 spike glycoprotein (PDB: 6vyb) interaction with its human target cell receptor, angiotensin 1 converting enzyme 2 (ACE2 PDB: 6m17). The green box in the center shows our modeling region that bounds the subsequent molecular dynamics (MD) simulations in the study. (B) The modeling region of the viral receptor binding domain (RBD) in red and ACE2 target protein in blue is shown along with the most relevant features of the binding interface; the N-terminal helices of ACE2 shown in green and the three main sites of the RBD N501 site interactions shown in magenta. (C) An overview of the workflow for the MD simulation pipeline for binding signature analysis used throughout this study. The difference in atom fluctuation or ‘dFLUX’ upon ACE2 binding by the virus is defined as the signed/symmetric KL divergence in the distribution of viral bound vs. unbound ACE2 atom fluctuations collected over 200 production runs and averaged to each amino acid site on the protein backbone (i.e. N, C, Cα, and O atoms).
Figure 2.
Figure 2.. An overview of the comparative protein dynamics analysis with DROIDS 4.0.
Two comparative functional protein states are defined and subjected to repeated sampling of molecular dynamics (MD) simulations. The differences in local atom motions (dFLUX) are defined by Kullback-Leibler (KL) divergences in protein backbone specific distributions in root mean square atom fluctuations (rmsf) collected over the MD sampling of each functional state. In this study, we compare the target protein human ACE2 in both its viral bound (blue-left) and its native unbound states (red-right). The reduced energy of motion at or near functional binding sites of the various viral strains will negatively shift the velocity of atoms creating a corresponding negative KL divergence in statistical distribution of rmsf (i.e. negative dFLUX).
Figure 3.
Figure 3.. Detecting functionally conserved dynamics with the maxDemon 3.0 machine learning algorithm.
(A) The MD simulation sets generated above are now the source of a pre-classified machine learning training set where class 0 and class 1 represent the binary functional states of the protein interaction (unbound vs bound). A feature vector is constructed comprising the atom fluctuations of any given amino acid (AA) site and its correlations to motions of downstream sites located 1, 3, 5, and 9 AA sites distant taken at every 50 frame interval or ‘time slice’ in the MD simulations. This rich feature vector thus captures both the local energy on the protein backbone, as well as potential sequence-based information that coordinates specific motions with nearby AA sites. (B) A multi-agent classifier comprised of up to seven learning methods is individually deployed on each site and a learning performance profile across all sites on the protein is generated as the time average classification of the site-specific dynamics defined by the feature vector. Thus, successful learning of functional dynamic features will deviate from an average classification of 0.5 on the profile. (C) This learning profile is then modeled for two evolutionary homologs (e.g. human ACE2 bound to the recent emergent strain SARS-CoV-2 and bound to bat CoV-HKU4, a potential zoonotic precursor) (D) A local canonical correlation of the learning performance profiles from both homologs is taken over 20 adjacent AA sites from a sliding window moved across the protein. Significant correlation of the homologous learning profiles identifies regions (shown in gray) where the functional binding dynamics defined by the pre-classification is functionally conserved from the time of divergence of the viral homologs.
Figure 4.
Figure 4.. Information theoretics for measuring variant impacts and functional coordination between sites.
Further information theoretics applied to the predicted binary classifications (0 or 1) of new MD runs can be used to compare relative impacts of different variants upon the signal of the conserved dynamics and to detect specific sites of functional coordination of conserved dynamics on the protein. (A) A relative entropy of the correlation (REC) of learning performance profiles can be used to measure both the local and total impact of genetic variation in the viral receptor binding domain on the functionally conserved binding dynamics of the bat/human orthologs. (B) Mutual information between the predicted binary states of learning classification (0 or 1) between any two amino acid sites, collected over the time of the simulations, can be used to detect functionally relevant coordination of the binding dynamics between the two sites. The mutual information is reported for all pairwise comparisons on the ACE2 protein as a heatmap with a scale blue/yellow/red indicating increasing coordination of functional binding dynamics.
Figure 5.
Figure 5.. DROIDS binding signature of dampened atom fluctuations in human ACE2 receptor protein upon interaction with both SARS-CoV-2 spike glycoprotein (modeled from PDB: 6m17) and the upon interaction with the past human outbreak strain SARS-CoV-1 spike glycoprotein (modeled from PDB: 6acg).
Shift in atom fluctuations were calculated as the signed symmetric Kullback-Leibler (KL) divergence (i.e. relative entropy or dFLUX) of distributions of the root mean square time deviations (rmsf) for residue averaged protein backbone atoms (i.e. N, Cα, C and O) for ACE2 in its spike bound (PDB: 6m17/6acg) versus unbound dynamic state (PDB: 6m17/6acg without the viral receptor binding domain or RBD). Here we show color mapping (A, B, D) and sequence positional plotting (C, E) of dampening of atom motion on viral RBD-protein target interface in blue for (A-C) the original strain of SARS-CoV-2 and (D-E) its predecessor, SARS-CoV-1. The sequence profile of the KL divergence between SARS-CoV-2 viral bound and unbound ACE2 produces strong negative peaks indicating key residue binding interactions with the N-terminal helices (white) and the N501 interactions at Q325, K353 and the AAQPLL 386–392 motif (yellow). The interactions at these sites are more moderated in the SARS-CoV-1 interaction with the ACE2 target protein.
Figure 6.
Figure 6.. DROIDS binding signature of dampened atom fluctuations in human ACE2 receptor protein upon interaction with both the highly transmissible B.1.1.7 (UK) variant and the B.1.351 (South African and Brazilian) variant of SARS-CoV-2 spike glycoprotein (modeled from PDB: 6m17).
Shift in atom fluctuations were calculated as the signed symmetric Kullback-Leibler (KL) divergence (i.e. relative entropy or dFLUX) of distributions of the root mean square time deviations (rmsf) for residue averaged protein backbone atoms (i.e. N, Cα, C and O) for ACE2 in its spike bound (PDB: 6m17) versus unbound dynamic state (PDB: 6m17 without the viral receptor binding domain or RBD). Here we show color mapping (A, B, D) and sequence positional plotting (C, E) of dampening of atom motion on viral RBD-protein target interface in blue for (A-C) the B.1.1.7 variant and (D-E) the B.1.351 variant of SARS-CoV-2. The sequence profile of the KL divergence between viral bound and unbound ACE2 produces negative peaks indicating key residue binding interactions with the N-terminal helices (white) and the N501 interactions at Q325, K353 and the AAQPLL 386–392 motif (yellow). The N501Y, E484K and K417N/T mutations are marked by the green stars (2B and 2D).
Figure 7.
Figure 7.. DROIDS binding signature of dampened atom fluctuations in human ACE2 receptor proteins upon interaction with two human commensal strains hCoV-OC43 and hCoV-HKU1 glycoprotein (modeled from PDB: 6ohw, 5gnb and 6m17).
Here we show color mapping (A, B, D) and sequence positional plotting (C, E) of dampening of atom motion on the viral RBD-protein target interface in blue for (A-C) the targeting of ACE2 by the most benign strain hCoV-OC43 and (D-E) its less benign counterpart hCoV-HKU1. The sequence profile of the KL divergence between viral bound and unbound target proteins produces strong negative peak indicating key residue binding interactions (C, E) with the N-terminal helices on ACE2 and (E) moderate interactions with K353 and Q325 observed only in hCoV-HKU1.
Figure 8.
Figure 8.. DROIDS binding signature of dampened atom fluctuations in human CD26 and ACE2 receptor proteins upon interaction with the bat progenitor strain batCoV-HKU4 glycoprotein (modeled from PDB: 4qzv and 6m17).
Here we show color mapping (A, B, D) and sequence positional plotting (C, E) of dampening of atom motion on the viral RBD-protein target interface in blue for (A-C) its known targeting of CD26 and (A,D-E) its hypothetical targeting of ACE2. The sequence profile of the KL divergence between viral bound and unbound target proteins produces two strong negative peaks indicating key residue binding interactions with the (C) the SAMLI and double serine motifs (shown in yellow) on CD26 and (E) the N-terminal helices on ACE2.
Figure 9.
Figure 9.. Functionally conserved and coordinated binding dynamics.
(A-C) Maps of evolutionary conserved functional binding dynamics (shown in dark gray) of viral RBD on both ACE2 sequence and structure. Three different evolutionary distances were considered including the most distant comparison between SARS-CoV-2 and (A) the hypothetical bat CoV-HKU4 progenitor, (B) the human endemic strain hCoV-HKU1, and (C) the UK variant, SARS-CoV-2 B.1.1.7. (D) These evolutionary distances are also represented as the total sum of the mutational impacts on conserved dynamics. (E) The coordination of the functionally conserved binding dynamics of ACE2 upon interactions with SARS-CoV-2 receptor binding domain is shown as a matrix with red/blue indicating high/low pairwise coordination (i.e. mutual information in learning classifications) between sites. The inset highlights the high degree of coordination occurring between K353, Q325 and the AAQFPLL motif in their dynamic interactions with the viral N501 site.
Figure 10.
Figure 10.. Feature vector and learning classification spaces for key binding sites on ACE2.
The machine learning training data and the optimized learning classification spaces is shown at four key residue sites on the binding interface of SARS-CoV-2 and ACE2. The X axis is the atom fluctuation at the given site and the Y axis is the correlations of the fluctuations with nearby downstream sites. The data points represent individual 50 frame time slices in the simulations with dark red indicating those time slices pre-classified to the viral bound state of ACE2 and dark blue points indicating those time slices pre-classified to the unbound state of ACE2. The colors of the background indicate the probability state of the machine learning classification model after training on the data points. The average correct classification of the data set is shown in the lower right corner of each plot.
Figure 11.
Figure 11.. A functional evolutionary model of SARS-type and endemic viral protein binding interactions during zoonotic spillover.
(A) The two-touch interaction model consists of a combination of evolutionary conserved ACE2 interactions at the N-terminal helices likely preadapted from its interaction with the SAMLI motif on CD26 (shown with black arrow) and a promiscuous, dynamically complex interaction with K353 and nearby sites at Q325 and AAQPFLL 386–392 (shown with red arrow), likely preadapted from its interaction with SS 333–334 on CD26. (B) Given the conserved nature of viral receptor binding across the diversity of strains we surveyed, we hypothesize these regions of the virus are most likely responsible for past and present instances of zoonotic spillover to humans. The more pathogenic SARS-like outbreaks associated with ACE2 seem to develop transient dynamics of N501 that involve promiscuous interactions at three co-located sites mentioned above. This may create a temporary ‘slippery two touch interaction with the ACE2 target that stabilizes of evolutionary time. In the much more transmissible new variant strains, (B.1.1.7, B.1.351, and P.1), the promiscuous interactions are mostly lost, favoring a much more stable binding interaction with K353 alone. In both endemic strains, we observe the evolution of enhanced binding to the N-terminal helices of ACE2 through the evolution of enhanced loop structure on the corresponding interface of the viral receptor binding domain.

Similar articles

References

    1. Ahamad S, Kanipakam H, Gupta D. 2020. Insights into the structural and dynamical changes of spike glycoprotein mutations associated with SARS-CoV-2 host receptor binding. J. Biomol. Struct. Dyn. 0:1–13. - PMC - PubMed
    1. Ali A, Vijayan R. 2020. Dynamics of the ACE2–SARS-CoV-2/SARS-CoV spike protein interface reveal unique mechanisms. Sci. Rep. 10:14214. - PMC - PubMed
    1. Al-Khafaji K, AL-Duhaidahawi D, Tok TT. 2020. Using integrated computational approaches to identify safe and rapid treatment for SARS-CoV-2. J. Biomol. Struct. Dyn. 0:1–9. - PMC - PubMed
    1. Andersen HC. 1980. Molecular dynamics simulations at constant pressure and/or temperature. J. Chem. Phys. 72:2384–2393.
    1. Babbitt GA, Fokoue EP, Evans JR, Diller KI, Adams LE. 2020. DROIDS 3.0-Detecting Genetic and Drug Class Variant Impact on Conserved Protein Binding Dynamics. Biophys. J. 118:541–551. - PMC - PubMed

Publication types