Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 2;3(1):vbac103.
doi: 10.1093/bioadv/vbac103. eCollection 2023.

Prediction of antibody binding to SARS-CoV-2 RBDs

Affiliations

Prediction of antibody binding to SARS-CoV-2 RBDs

Eric Wang. Bioinform Adv. .

Abstract

Summary: The ability to predict antibody-antigen binding is essential for computational models of antibody affinity maturation and protein design. While most models aim to predict binding for arbitrary antigens and antibodies, the global impact of SARS-CoV-2 on public health and the availability of associated data suggest that a SARS-CoV-2-specific model would be highly beneficial. In this work, we present a neural network model, trained on ∼315 000 datapoints from deep mutational scanning experiments, that predicts escape fractions of SARS-CoV-2 RBDs binding to arbitrary antibodies. The antibody embeddings within the model constitute an effective sequence space, which correlates with the Hamming distance, suggesting that these embeddings may be useful for downstream tasks such as binding prediction. Indeed, the model achieves Spearman correlation coefficients of 0.46 and 0.52 on two held-out test sets. By comparison, correlation coefficients calculated using existing structure and sequence-based models do not exceed 0.28. The correlation coefficient against dissociation constants of antibodies binding to SARS-CoV-2 RBD variants is 0.46. Additionally, the residue-level escapes are highest in the antibody epitope, correlating well with experimentally measured escapes. We further study the effect of antibody chain use, embedding dimension size and feed-forward and convolutional architectures on the model results. Lastly, we find that the inference time of our model is significantly faster than previous models, suggesting that it could be a useful tool for the accurate and rapid prediction of antibodies binding to SARS-CoV-2 RBDs.

Availability and implementation: The model and associated code are available for download at https://github.com/ericzwang/RBD_AB.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Diagram of the neural network architecture and the construction of the antibody space
Fig. 2.
Fig. 2.
t-SNE plot of antibody embeddings. (a) The embedding for each antibody chain is 20 dimensions, and points are colored based on clusters identified through DBSCAN clustering of the Hamming distances between antibody sequences. (b) The embedding for each antibody chain is 5 dimensions, and points are colored based on clusters identified from Hamming distances. (c) The embedding for each antibody chain is 2 dimensions, and points are colored based on clusters identified from Hamming distances. (d) The embedding for each antibody chain is 20 dimensions, and points are colored based on known antibody class
Fig. 3.
Fig. 3.
Spearman correlation coefficients of the models against different test sets. * indicates that the model predicts ΔΔG instead of ΔG, so for these the log escape fraction of the wild-type RBD was subtracted from the data. Structural models could not be used for the WT-NoStruc and Variants datasets, since these did not have solved structures associated with them. Errors are bootstrapped standard errors
Fig. 4.
Fig. 4.
Comparison of predicted residue-level escape with measured escapes from DMS for the COV2-2196 and C002 antibodies. Structures of the antibody–RBD complexes are shown on the left for comparison (RBD: blue, antibody: gray). Complex structures were obtained from the PDB (COV2-2196: 7L7D, C002: 7K8S)
Fig. 5.
Fig. 5.
Effects of changes in network structure on test set correlations. (a) Spearman correlations for networks using different antibody chains. ‘Both’ corresponds to the original model. (b) Spearman correlations for different numbers of dimensions in protein sequence embeddings; 20 corresponds to the original model. (c) Spearman correlations for different network architectures. ‘Transformer’ corresponds to the original model
Fig. 6.
Fig. 6.
Inference times for a single prediction of each model. The inference time of the model itself is indicated under ‘Inference time’. ‘Inference time + structural optimization time’ includes the time required to optimize structures with a mutated residue. Models that do not require structural optimization or perform it within the model are indicated with an asterisk, and the base inference time is shown for simpler comparison. Errors are standard errors over three independent samples

Similar articles

Cited by

References

    1. Abbasi W.A. et al. (2020) ISLAND: In-silico proteins binding affinity prediction using sequence information. BioData Min., 13, 20. - PMC - PubMed
    1. Alley E.C. et al. (2019) Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods, 16, 1315–1322. - PMC - PubMed
    1. Barnes C.O. et al. (2020) SARS-CoV-2 neutralizing antibody structures inform therapeutic strategies. Nature, 588, 682–687. - PMC - PubMed
    1. Beshnova D. et al. (2022) Computational approach for binding prediction of SARS-CoV-2 with neutralizing antibodies. Comput. Struct. Biotechnol. J., 20, 2212–2222. - PMC - PubMed
    1. Cai Y. et al. (2021) Structural basis for enhanced infectivity and immune evasion of SARS-CoV-2 variants. Science, 373, 642–648. - PMC - PubMed