Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 3;3(1):100374.
doi: 10.1016/j.crmeth.2022.100374. eCollection 2023 Jan 23.

Toward real-world automated antibody design with combinatorial Bayesian optimization

Affiliations

Toward real-world automated antibody design with combinatorial Bayesian optimization

Asif Khan et al. Cell Rep Methods. .

Abstract

Antibodies are multimeric proteins capable of highly specific molecular recognition. The complementarity determining region 3 of the antibody variable heavy chain (CDRH3) often dominates antigen-binding specificity. Hence, it is a priority to design optimal antigen-specific CDRH3 to develop therapeutic antibodies. The combinatorial structure of CDRH3 sequences makes it impossible to query binding-affinity oracles exhaustively. Moreover, antibodies are expected to have high target specificity and developability. Here, we present AntBO, a combinatorial Bayesian optimization framework utilizing a CDRH3 trust region for an in silico design of antibodies with favorable developability scores. The in silico experiments on 159 antigens demonstrate that AntBO is a step toward practically viable in vitro antibody design. In under 200 calls to the oracle, AntBO suggests antibodies outperforming the best binding sequence from 6.9 million experimentally obtained CDRH3s. Additionally, AntBO finds very-high-affinity CDRH3 in only 38 protein designs while requiring no domain knowledge.

Keywords: Bayesian optimization; Gaussian processes; combinatorial Bayesian optimization; computational antibody design; machine learning; protein engineering; structural biology.

PubMed Disclaimer

Conflict of interest statement

This article is an open-source research contribution by Huawei, Tech R&D (UK). We release all used resources on GitHub. This work was carried out while A.K. was previously employed as a research scientist intern position, and A.I.C.-R. was previously employed a research scientist position at Huawei, Tech R&D (UK), and Huawei owns all intellectual property rights in the work detailed herein. A.G., D.-G.-X.D., R.T., J.W., and H.B.-A. are currently affiliated with Huawei. V.G. holds advisory board positions in aiNET GmbH and Enpicorm B.V. and is also a consultant for Roche/Genetech.

Figures

None
Graphical abstract
Figure 1
Figure 1
AntBO iteratively proposes a CDRH3 sequence and requests its affinity to Absolut! before adapting its posterior with the affinity of this sequence (A and B) The performance of AntBO or other optimization tools is measured as the highest affinity achieved and how fast it reaches high affinity. (A) The demonstrative example of two CDRH3 sequences not satisfying the developability criterion is discarded in the overall optimization procedure. (B) Overall optimization process of AntBO for antibody design: from a predefined target antigen structure (discretized from its known PDB structure), binding affinities of antibody CDRH3 sequences to the antigen are simulated using Absolut! as an in silico surrogate for costly experimental measurements. AntBO treats Absolut! as a black-box to be optimized for Ebind and can suggest high-affinity CDRH3 protein designs within a trust region of acceptable sequences.
Figure 2
Figure 2
AntBO is a sample-efficient solution for antibody design compared with existing baseline methods AntBO with the transformed overlap kernel can find binding antibodies while outperforming other methods. It takes around 38 steps to suggest an antibody sequence that surpasses a very-high-affinity sequence from the Absolut! 6.9 M database and about 100 to outperform a super+ affinity sequence. We run all methods with 10 random seeds and report the mean and 95% confidence interval for the 12 antigens of interest. The title of each plot is a PDB ID followed by the chain of an antigen. For extended results on the remaining 147 antigens, we refer to readers to Figures S3, S4, and S5. To understand the AntBO optimization, we also report the 3D visualization for an antigen 1ADQ_A in Figure S2.
Figure 3
Figure 3
We compare the binding energy threshold of different categories (low, high, very high, super, super+) obtained from the Absolut! 6.9M database and the average binding affinity of a sequence designed using AntBO methods and the baselines The energycan scores are normalized by the threshold of the super+ category. We observe AntBO outperforms the best sequence in a majority of antigens and emerges as the best method in finding high-binding-affinity sequences in under 200 evaluations.
Figure 4
Figure 4
AntBO can design antibodies that achieve diverse developability scores, demonstrating that it is a viable method to be practically investigated We analyze the developability scores of 200 proteins designed by each method averaged across all 10 random seeds to simulate the diversity of suggested proteins across a single trial. Here, we report developability scores for S protein from the SARS-CoV virus (PDB: 2DD8). The landscape of designed sequences suggested during the optimization process for each method is shown with their binding affinity and three developability scores (hydropathicity, charge, and instability). We also take super+ (top 0.01%) sequences from the Absolut! 6.9 M database and report their mean developability scores denoted by a star () in the plots. Interestingly, we observe a positive correlation between hydropathicity increasing with energy. While other methods have a larger charge spread, we see AntBO favorably suggesting the most points with a neutral charge. We observe the spread of developability scores of AntBO methods is close to the average score of super+ sequences. Overall, we conclude that energetically favorable sequences still explore a diverse range of developability scores and that the protein designs of AntBO are more stable than other methods.
Figure 5
Figure 5
Effect of different initial class distributions on BO convergence Experiments are run for three sets of initial points varying with the amount of binder (top 1%) and non-binders (remaining sequences): losers 20L (with only non-binders), mascotte 10L-10M (half non-binders and half low binders), and heroes 6L-6M-8H (six non-binders, six low binders, and eight high binders). The top is the BO convergence plot with a horizontal line denoting the energy threshold to reach the super binder level. The bottom figures show the histogram of the number of antibody designs required to reach super binding affinity class averaged across 5 trials. We find that for the majority of antigens, prior knowledge of binders helps in reducing the number of evaluations.
Figure 6
Figure 6
AntBO benefits from the knowledge of a prior binding sequence in arriving at super binders The average number of antibody designs reduces when information about known binders is made available to GP surrogate model. On the y axis, we report the average number of iterations required across all antigens to reach the super binding affinity class (outperforming the best sequence in the Absolut! database), and on the x axis, we have three affinity classes, namely losers 20L (with only non-binders), mascotte 10L-10M (half non-binders and half low binders), and heroes 6L-6M−8H (six non-binders, six low binders, and eight high binders).

References

    1. Punt J. 8th edition edition. W. H. Freeman; 2018. Kuby Immunology.
    1. Chothia C., Lesk A.M. Canonical structures for the hypervariable regions of immunoglobulins. J. Mol. Biol. August 1987;196:901–917. - PubMed
    1. Rajewsky K., Förster I., Cumano A. Evolutionary and somatic selection of the antibody repertoire in the mouse. Science. 1987;238:1088–1094. - PubMed
    1. Xu J.L., Davis M.M. Diversity in the cdr3 region of vh is sufficient for most antibody specificities. Immunity. 2000;13:37–45. - PubMed
    1. Akbar R., Robert P.A., Pavlović M., Jeliazkov J.R., Snapkov I., Slabodkin A., Weber C.R., Scheffer L., Miho E., Haff I.H., et al. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Rep. 2021;34:108856. - PubMed

Publication types

MeSH terms

LinkOut - more resources