. 2022 Oct 13;185(21):4008-4022.e14.

doi: 10.1016/j.cell.2022.08.024. Epub 2022 Aug 31.

Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain

Affiliations

¹ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland; Botnar Research Centre for Child Health, Basel 4058, Switzerland.
² Alloy Therapeutics (Switzerland) AG, Basel 4058, Switzerland.
³ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland.
⁴ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland; Botnar Research Centre for Child Health, Basel 4058, Switzerland; Department of Biology, Institute of Microbiology and Immunology, ETH Zurich, Zurich 8093, Switzerland; Department of Pathology and Immunology, University of Geneva, Geneva 1211, Switzerland.
⁵ Te Huataki Waiora School of Health, University of Waikato, Hamilton 3240, New Zealand.
⁶ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland; Botnar Research Centre for Child Health, Basel 4058, Switzerland. Electronic address: sai.reddy@ethz.ch.

PMID: 36150393
PMCID: PMC9428596
DOI: 10.1016/j.cell.2022.08.024

Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain

Joseph M Taft et al. Cell. 2022.

. 2022 Oct 13;185(21):4008-4022.e14.

doi: 10.1016/j.cell.2022.08.024. Epub 2022 Aug 31.

Authors

Affiliations

¹ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland; Botnar Research Centre for Child Health, Basel 4058, Switzerland.
² Alloy Therapeutics (Switzerland) AG, Basel 4058, Switzerland.
³ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland.
⁴ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland; Botnar Research Centre for Child Health, Basel 4058, Switzerland; Department of Biology, Institute of Microbiology and Immunology, ETH Zurich, Zurich 8093, Switzerland; Department of Pathology and Immunology, University of Geneva, Geneva 1211, Switzerland.
⁵ Te Huataki Waiora School of Health, University of Waikato, Hamilton 3240, New Zealand.
⁶ Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland; Botnar Research Centre for Child Health, Basel 4058, Switzerland. Electronic address: sai.reddy@ethz.ch.

PMID: 36150393
PMCID: PMC9428596
DOI: 10.1016/j.cell.2022.08.024

Abstract

The continual evolution of SARS-CoV-2 and the emergence of variants that show resistance to vaccines and neutralizing antibodies threaten to prolong the COVID-19 pandemic. Selection and emergence of SARS-CoV-2 variants are driven in part by mutations within the viral spike protein and in particular the ACE2 receptor-binding domain (RBD), a primary target site for neutralizing antibodies. Here, we develop deep mutational learning (DML), a machine-learning-guided protein engineering technology, which is used to investigate a massive sequence space of combinatorial mutations, representing billions of RBD variants, by accurately predicting their impact on ACE2 binding and antibody escape. A highly diverse landscape of possible SARS-CoV-2 variants is identified that could emerge from a multitude of evolutionary trajectories. DML may be used for predictive profiling on current and prospective variants, including highly mutated variants such as Omicron, thus guiding the development of therapeutic antibody treatments and vaccines for COVID-19.

Keywords: artificial intelligence; deep learning; deep sequencing; directed evolution; machine learning; protein engineering; viral escape; yeast display.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests ETH Zurich has filed for patent protection on the technology described herein, and J.M.T., C.R.W., B.G., R.A.E., and S.T.R. are named as co-inventors. C.R.W. is an employee of Alloy Therapeutics (Switzerland) AG. C.R.W. and S.T.R. may hold shares of Alloy Therapeutics. S.T.R. is on the scientific advisory board of Alloy Therapeutics.

Figures

**Figure 1**
Overview of deep mutational learning of the RBD for prediction of ACE2 binding and antibody escape The RBD or the SARS-CoV-2 spike protein is expressed on the surface of yeast, and mutagenesis libraries are designed on the RBM of the RBD (RBM-3, RBM-1, and RBM-2), which are the sites of interaction with ACE2 and neutralizing antibodies (e.g., therapeutic antibody drugs). RBD libraries are screened by FACS for binding to ACE2 and neutralizing antibodies, both binding and non-binding (escape) populations are isolated and subjected to deep sequencing. Machine learning models are trained to predict binding status to ACE2 or antibodies based on RBD sequence. Machine learning models are then used to predict ACE2 binding and antibody escape on current and prospective variants and lineages.

**Figure 2**
Design of RBD mutagenesis libraries and screening by yeast surface display and deep sequencing (A) Shown is the amino acid usage in the combinatorial libraries (libraries 3C, 1C, and 2C). Degenerate codons are derived from DMS data for ACE2 binding (Starr et al., 2020). (B) Representative examples of degenerate codons tiled across RBM-2, which are pooled together to comprise library 2T. (C) Flow cytometry dot plots depict yeast display screening of combinatorial (1C, 2C, 2CE, and 3C) and tiling (1T, 2T, and 3T) RBD libraries and control RBD (Wu-Hu-1); gating schemes correspond to selection of ACE2-binding and non-binding variants. (D) Amino acid logo plots of the RBD are based on deep sequencing data from ACE2-binding and non-binding selections. (E) Flow cytometry dot plots depict yeast display screening of pooled RBD libraries (2C and 2CE) after selection for ACE2 binding; gating schemes correspond to selection of variants for binding and escape (non-binding) to monoclonal antibodies (mAbs). See also Figures S1 and S2 and Tables S1–S3.

**Figure S1**
Design and screening of RBD libraries, related to Figure 2 and Table S1 (A) Amino acid distribution of combinatorial libraries RBM-1 and RBM-3. (B) Yeast display of RBD libraries pre-selected for ACE2 binding were sorted by flow cytometry for binding and escape to four therapeutic monoclonal antibodies (mAbs): LY-CoV16, LY-CoV555, REGN10933, and REGN10987. (C) A further nine monoclonal antibodies were screened for binding and escape. Approximately 10⁷ yeast cells were screened for each antibody.

**Figure S2**
Combinatorial sequence space of RBD libraries following selection, related to Figure 2 and Table S2 Sequence logo plots of sorted populations for ACE2 binding and antibody escape. For each population, up to the 10,000 most abundant unique amino acid sequences after read count thresholding are shown.

**Figure 3**
Training and testing of machine and deep learning models for prediction of ACE2 binding and antibody escape based on RBD sequence (A) Deep sequencing data from ACE2 and monoclonal antibody (mAb) selections is encoded by one-hot encoding and used to train supervised machine learning (e.g., random forest [RF]) and deep learning models (e.g., recurrent neural network [RNN]). Models perform classification by predicting a probability (P) of ACE2 binding or non-binding and mAb binding or escape (non-binding) based on the RBD sequence. (B and C) Performance of RF and RNN models trained on 2T, 2C, or Full ACE2 or LY-CoV16 binding data shown by accuracy, F1, and receiver operating characteristic (ROC) curves. Models are evaluated by rounds of external cross-validation (n = 5), with mean performance displayed and standard deviation indicated by error bars. Low and high distance sequences are defined as those ≤ED₅ and ≥ED₆ from Wu-Hu-1 RBD, respectively. (D and E) Accuracy, F1, and AUC of all 13 mAb models trained on RBM-2 and RBM-1 data, evaluated on both low and high distance test sequences. See also Figures S3 and S4 and Table S4. Detailed sequences used as the training data for individual models, Table S6. Machine and deep learning model predictions compared to susceptibility data from the Stanford Database.

**Figure S3**
Performance metrics machine learning models, related to Figure 3 and Table S4. Detailed sequences used as the training data for individual models, Table S6. Machine and deep learning model predictions compared to susceptibility data from the Stanford Database (A) K-nearest neighbors (KNNs), logistic regression (Log Reg), naive Bayes (NB), random forest (RF), long-short term memory recurrent neural network (RNN), support vector machine with linear kernel (SVM Linear), and support vector machine with radial basis function kernel (SVM RBF) models were trained on the ACE2 deep sequencing data without hyperparameter optimization. Models were then challenged to perform classification by predicting a probability (P) of ACE2 binding on test data. Performance of models was evaluated by accuracy, F1, precision, and recall. All models except RNN were trained using Sci-kit Learn, and the RNN was trained using Keras. (B) K-nearest neighbors (KNNs), logistic regression (Log Reg), naive Bayes (NB), random forest (RF), long-short term memory recurrent neural network (RNN), support vector machine with linear kernel (SVM Linear), and support vector machine with radial basis function kernel (SVM RBF) models were trained on the ACE2 deep sequencing data without hyperparameter optimization. Models were then challenged to perform classification by predicting a probability (P) of ACE2 binding on test data. Performance of models was evaluated by accuracy, F1, precision, and recall. All models except RNN were trained using Sci-kit Learn, and the RNN was trained using Keras. (C and D) DMS trained random forest (RF) and long-short term memory recurrent neural network (RNN) models were evaluated on the larger combinatorial ACE2 binding test data shown by accuracy, F1 graphs, and ROC curves.

**Figure S4**
Distribution of binding and non-binding across RBM regions, related to Figure 3 Count distributions of unique binding/non-binding sequences from the ACE2 and antibody selection library datasets after pre-processing. (A) RBM-1, (B) RBM-2, and (C) RBM-3.

**Figure S5**
Experimental evaluation of selected RBD variants for antibody escape, related to Figure 4 (A) The 46 selected synthetic variants were individually cloned and expressed for yeast display and ACE2 binding by flow cytometry. 43 variants showed ACE2 binding or non-binding that matched machine learning predictions. The ACE2-binding status for two variants (38 and 42) could not be conclusively determined, while one variant (41) showed was incorrectly predicted by machine learning for ACE2 binding. (B) RBD sequences at chosen EDs (ED₀, ED₃, ED₅, and ED₇) from the Wu-Hu-1 RBD were predicted for ACE2 binding and escape from four therapeutic monoclonal antibodies (mAbs). Accuracy for antibody escape predictions are the following: LY-CoV16 = 31/33 (93.94%), LY-CoV555 = 30/33 (90.91%), REGN10933 = 31/33 (93.94%), and REGN10987 = 32/33 (96.97%). (C and D) Two double mutants, and their constituent mutations, which were predicted to display epistasis were assayed individually by yeast surface display. (E and F) Three synthetic RBD variants of ED₃ from Wu-Hu-1 RBD that were predicted to escape all four therapeutic antibodies by the consensus machine learning model were expressed as individual clones in yeast and evaluated by flow cytometry for binding to antibody or ACE2.

**Figure 4**
Prediction and experimental validation of synthetic lineages of RBD variants (A) Workflow to select and test synthetic variants at chosen edit distances (ED₃, ED₅, and ED₇) from Wu-Hu-1 RBD. (B) Lineage plot of synthetic variants depicts machine learning predictions and experimental validation (Figure S5) for ACE2 binding and non-binding. (C) Dot plots of synthetic variants correspond to machine learning model (RF and RNN) predictions and experimental validation for antibody binding or escape. (D) Structural modeling by AlphaFold2 shows predicted structures of RBD variants that are ACE2 binding (green boxes) or non-binding (red boxes); control is Wu-Hu-1 RBD (black box). See also Figure S5.

**Figure 5**
Predictive profiling of selected RBD variants for antibody escape across low mutational distances (A, D, and G) Heatmap depicts monoclonal antibody (mAb) binding as assessed by RF and RNN models of ED₁ and ED₂ variants of Alpha, Beta, and Kappa. (B, E, and H) The number of sequences escaping a combination of n (number) mAbs for ED₁ and ED₂ (agreement between models, threshold >0.5). (C, F, and I) Deep escape networks display possible evolutionary paths between variants and their escape from mAbs. See also Figure S6.

**Figure S6**
Predictive profiling of additional selected RBD variants for antibody escape across low mutational distances, related to Figure 5 (A, D, and G) Heatmap depicts monoclonal antibody (mAb) binding as assessed by RF and RNN models of ED₁ and ED₂ variants of Wu-Hu-1, Gamma, and B.1.523. (B, E, and H) The number of sequences escaping a combination of n (number) mAbs for ED₁ and ED₂ (agreement between models, threshold > 0.5). (C, F, and I) Deep escape networks display possible evolutionary paths between variants and their escape from mAbs.

**Figure 6**
Determining antibody robustness to synthetic RBD variants and mutational lineages (A) Omicron (BA.1) mutations covered by combinatorial library RBM-2. (B) Binding prediction for single and combinatorial mutations observed in Omicron. (C) Dynamic escape profile along Omicron lineage with percentage escape sequences across all mutations at distance 1–4 from Wu-Hu-1. (D) Antibody prediction of ACE2 binding RBDs for each antibody at edit distance 6–10 from Wu-Hu-1 (10,000 sequences simulated in triplicate, only confident predictions shown (i.e., P(ACE2 binding) > 0.5 and either P(antibody binding) > 0.75 or P(antibody escape) < 0.25 for both RNN and RF). (E) Total count of confident predictions across all distances (mean across triplicates).

See this image and copyright information in PMC

References

1. Akbar R., Robert P.A., Weber C.R., Widrich M., Frank R., Pavlović M., Scheffer L., Chernigovskaya M., Snapkov I., Slabodkin A., et al. In silico proof of principle of machine learning-based antibody design at unconstrained scale. bioRxiv. 2021 doi: 10.1101/2021.07.08.451480. Preprint at. - DOI - PMC - PubMed
1. Antia R., Halloran M.E. Transition to endemicity: understanding COVID-19. Immunity. 2021;54:2172–2176. doi: 10.1016/j.immuni.2021.09.019. - DOI - PMC - PubMed
1. Barnes C.O., West A.P., Huey-Tubman K.E., Hoffmann M.A.G., Sharaf N.G., Hoffman P.R., Koranda N., Gristick H.B., Gaebler C., Muecksch F., et al. Structures of human antibodies bound to SARS-CoV-2 spike reveal common epitopes and recurrent features of antibodies. Cell. 2020;182:828–842.e16. doi: 10.1016/j.cell.2020.06.025. - DOI - PMC - PubMed
1. Baum A., Fulton B.O., Wloga E., Copin R., Pascal K.E., Russo V., Giordano S., Lanza K., Negron N., Ni M., et al. Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies. Science. 2020;369:1014–1018. doi: 10.1126/science.abd0831. - DOI - PMC - PubMed
1. Boder E.T., Wittrup K.D. Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol. 1997;15:553–557. doi: 10.1038/nbt0697-553. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions

Supplementary concepts

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain

Affiliations

Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Supplementary concepts

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous