Learned features of antibody-antigen binding affinity

Nathaniel L Miller^{1

2}, Thomas Clark^{1

2}, Rahul Raman^{1

2}, Ram Sasisekharan^{1

2}

Affiliations

¹ Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.
² Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States.

PMID: 36895805
PMCID: PMC9989197
DOI: 10.3389/fmolb.2023.1112738

Learned features of antibody-antigen binding affinity

Nathaniel L Miller et al. Front Mol Biosci. 2023.

. 2023 Feb 21:10:1112738.

doi: 10.3389/fmolb.2023.1112738. eCollection 2023.

Authors

Nathaniel L Miller^{1

2}, Thomas Clark^{1

2}, Rahul Raman^{1

2}, Ram Sasisekharan^{1

2}

Affiliations

¹ Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.
² Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States.

PMID: 36895805
PMCID: PMC9989197
DOI: 10.3389/fmolb.2023.1112738

Abstract

Defining predictors of antigen-binding affinity of antibodies is valuable for engineering therapeutic antibodies with high binding affinity to their targets. However, this task is challenging owing to the huge diversity in the conformations of the complementarity determining regions of antibodies and the mode of engagement between antibody and antigen. In this study, we used the structural antibody database (SAbDab) to identify features that can discriminate high- and low-binding affinity across a 5-log scale. First, we abstracted features based on previously learned representations of protein-protein interactions to derive 'complex' feature sets, which include energetic, statistical, network-based, and machine-learned features. Second, we contrasted these complex feature sets with additional 'simple' feature sets based on counts of contacts between antibody and antigen. By investigating the predictive potential of 700 features contained in the eight complex and simple feature sets, we observed that simple feature sets perform comparably to complex feature sets in classification of binding affinity. Moreover, combining features from all eight feature-sets provided the best classification performance (median cross-validation AUROC and F1-score of 0.72). Of note, classification performance is substantially improved when several sources of data leakage (e.g., homologous antibodies) are not removed from the dataset, emphasizing a potential pitfall in this task. We additionally observe a classification performance plateau across diverse featurization approaches, highlighting the need for additional affinity-labeled antibody-antigen structural data. The findings from our present study set the stage for future studies aimed at multiple-log enhancement of antibody affinity through feature-guided engineering.

Keywords: affinity; antibody; antigen; classification; features; learning; structure.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Workflow and data set overview. **(A)** Open-source workflow implemented via Google Colab. **(B)** Characteristics of affinity-labeled antibody data set from SAbDab. The counts of the antibodies displaying each heavy and light subclass are shown, as well as the distribution of antibody affinities and structural resolutions. For the affinity distribution, all antibodies included in the dataset are shown in blue, while the subset of antibodies containing a glycan or lipid in the epitope-paratope interface are shown in orange.

**FIGURE 2**
Features used to describe antibody-antigen interactions **(A)** Existing featurizations that have been validated for diverse protein-protein interaction modeling exercises. These featurizations model aspects of the antibody-antigen binding event such that they may be useful for affinity classification (see Methods). **(B)** Simple features that directly describe single components of the antibody-antigen interaction. These include the number of each combination of amino acid contact within the interface (aa_counts), the count of each interaction type for each CDR (aa_counts_CDRs), basic antibody descriptive information including the length and canonical class of each CDR (Ab_info), as well as the number of multivalent contacts spanning the epitope-paratope interface (num_multivalent_contacts).

**FIGURE 3**
Correlations between top features and affinity for the combined classifier. Correlation matrix for the 16 features selected in the combined classifier versus each other and the affinity value. The parent feature-set for each feature is given at left. Affinity is treated as *−log10(KD [M])* such that a positive correlation (red) describes a feature (e.g., number of aromatic-aromatic contacts) that is positively correlated with higher affinity. Pearson correlation coefficients are shown according to the color scale and range from −0.5 to 0.3.

**FIGURE 4**
Relationships amongst most important features for classification of high- vs. low-affinity antibodies. Pairwise relationship matrix for the top 10 features across the feature-sets, as determined by feature importance rankings for th combined classifier (Table 2). For all plots, high- and low-affinity antibodies are shown in orange and blue, respectively. Cutoffs for high- and low-affinity are one-log above or below the dataset median affinity of 3nM (<300pM or >30nM). The raw data for each feature-pair are shown in the upper triangle. On the diagonal, kernel density estimates (KDE) for the distribution of each feature are shown. On the lower triangle, contour plots are superimposed on the raw pairwise data. Two pairwise associations and one KDE are enlarged on the right side of the figure. Additional pairwise relationship matrices for the top five features within each feature-set are provided in the supplement (Supplementary Figure S1–S8).

See this image and copyright information in PMC

References

1. Abanades B., Georges G., Bujotzek A., Deane C. M. (2022). ABlooper: Fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics 38, 1877–1880. 10.1093/BIOINFORMATICS/BTAC016 - DOI - PMC - PubMed
1. Adolf-Bryfogle J., Kalyuzhniy O., Kubitz M., Weitzner B. D., Hu X., Adachi Y., et al. (2018). RosettaAntibodyDesign (RAbD): A general framework for computational antibody design. PLoS Comput. Biol. 14, e1006112. 10.1371/journal.pcbi.1006112 - DOI - PMC - PubMed
1. Akdel M., Pires D. E. V., Pardo E. P., Jänes J., Zalevsky A. O., Mészáros B., et al. (2022). A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 29 (29), 111056–111067. 10.1038/s41594-022-00849-w - DOI - PMC - PubMed
1. AlQuraishi M. (2021). Machine learning in protein structure prediction. Curr. Opin. Chem. Biol. 65, 1–8. 10.1016/J.CBPA.2021.04.005 - DOI - PubMed
1. Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G. R., et al. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876. 10.1126/science.abj8754 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Learned features of antibody-antigen binding affinity

Affiliations

Learned features of antibody-antigen binding affinity

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources