Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 29;17(3):e1008864.
doi: 10.1371/journal.pcbi.1008864. eCollection 2021 Mar.

Using the antibody-antigen binding interface to train image-based deep neural networks for antibody-epitope classification

Affiliations

Using the antibody-antigen binding interface to train image-based deep neural networks for antibody-epitope classification

Daniel R Ripoll et al. PLoS Comput Biol. .

Abstract

High-throughput B-cell sequencing has opened up new avenues for investigating complex mechanisms underlying our adaptive immune response. These technological advances drive data generation and the need to mine and analyze the information contained in these large datasets, in particular the identification of therapeutic antibodies (Abs) or those associated with disease exposure and protection. Here, we describe our efforts to use artificial intelligence (AI)-based image-analyses for prospective classification of Abs based solely on sequence information. We hypothesized that Abs recognizing the same part of an antigen share a limited set of features at the binding interface, and that the binding site regions of these Abs share share common structure and physicochemical property patterns that can serve as a "fingerprint" to recognize uncharacterized Abs. We combined large-scale sequence-based protein-structure predictions to generate ensembles of 3-D Ab models, reduced the Ab binding interface to a 2-D image (fingerprint), used pre-trained convolutional neural networks to extract features, and trained deep neural networks (DNNs) to classify Abs. We evaluated this approach using Ab sequences derived from human HIV and Ebola viral infections to differentiate between two Abs, Abs belonging to specific B-cell family lineages, and Abs with different epitope preferences. In addition, we explored a different type of DNN method to detect one class of Abs from a larger pool of Abs. Testing on Ab sets that had been kept aside during model training, we achieved average prediction accuracies ranging from 71-96% depending on the complexity of the classification task. The high level of accuracies reached during these classification tests suggests that the DNN models were able to learn a series of structural patterns shared by Abs belonging to the same class. The developed methodology provides a means to apply AI-based image recognition techniques to analyze high-throughput B-cell sequencing datasets (repertoires) for Ab classification.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Schematic overview describing the steps required to generate fingerprints for Deep Neural Network image analysis.
We used the Rosetta Antibody software to generate multiple 3-D models of a particular Ab or one of its antibody binding fragment (FAB) using the light and heavy chain sequences as input data. For each 3-D model, we used PYMOL to produce a fine grid perpendicular to the main axis of the Ab, which intersects the Ab binding site region. We selected amino acid residues from the model that lies within a distance of 20 Å from the grid, and their atoms were projected onto the 2-D grid and displayed using a “dot” representation. The image was then colored according to the desired color-scheme using either a charge or an amino acid property based representation. The resulting image was then stored as an image file. The transformation of the sequence into an image allowed us to train DNNs models for Ab classification purposes using collections of fingerprint sets from multiple Abs.
Fig 2
Fig 2. Variation of the loss function for DNN models with the number of learning cycles.
The compound blue line represents average loss per epochs during training of 10 DNN models. Top and bottom of the gray area correspond to the maximum and minimum limits of the loss at each epoch for ten models. After about 30 epochs, there was no improvement in the loss function and we typically terminated the training at 30 epochs.
Fig 3
Fig 3. Schematic diagram of the allocation of fingerprints into training, validation, and testing sets.
Antibody assignment: Antibodies are randomly split into two fractions: training/validation and testing. Fingerprints assignment: The fingerprint images of an Ab selected for testing are added to a common pool in the test set. If the Ab was selected for training/validation, its fingerprints are divided into two fractions: each fraction is added to specific pools in the training and the validation sets associated with the Ab class.
Fig 4
Fig 4. Set of four antibodies associated with one family lineage.
(A) The graph highlights the amino acid substitutions in the heavy chain CDR3 region of the Abs with respect to the germline gene. Abs ADI-15912 and ADI-15843 share the same CDR3 sequence. (B) Each column shows three fingerprints for each Ab of the family showing how the amino acid substitutions listed in (A), and conformational changes in the models affect the fingerprints.
Fig 5
Fig 5. Prediction accuracy of DNNs trained to detect Ab family lineage.
Plot of the “F1-score local”-metric as a function of the family lineage for two types of DNN models that we trained with Ab fingerprints generated by two alternative coloring schemes, i.e., by residue charge (black circles) or by reduced-amino-acid alphabet (grey squares).
Fig 6
Fig 6. Main regions of the EBOV GP trimer for Ab recognition.
(A) Structural model of the EBOV GP trimer recognized by anti-EBOV Abs. Abs have been colored according to the regions of the trimer that they bind, i.e., B) base of the trimer, C) the α-helical heptad repeat 2 (HR2) region, and D) the glycan Cap domains.
Fig 7
Fig 7. Antibody recognition sites in HIV GP120/GP41.
Two main Ab binding regions based on the structural complex of the Ebola surface glycoprotein GP120/GP41 proteins with anti-HIV-1 Abs. Site 1 encompasses a structural overlay of 18 different site-specific Abs, whereas Site 2 contains 10 different Abs.
Fig 8
Fig 8. Representative Ab fingerprints from normal and outlier classes.
(A) Exemplar images of Ab fingerprints classified as belonging to lineage 1 by one DNN model. The images from panel A are ordered according to scores assigned by the neural network. They are organized in rows of 10 images, with scores decreasing from left to right. (B) Exemplar images of Ab fingerprints classified as belonging to outliers (Abs from lineages other than 1) by the DNN model. The images from panel B are ordered according to scores assigned by the neural network. They are organized in rows of 10 images, with scores increasing from left to right. These results correspond to the DNN model listed as “3” in Table 10.
Fig 9
Fig 9. Detection of clonally diverse antibodies using the OCC method RCAE.
a This number corresponds to the ranking assigned to the 100 DNN models based on the AUROC score computed on the testing set. b Image reconstruction errors ranked from low to high. Gray circles are associates with fingerprints from anomalous Abs. Colored circles highlight clusters of errors for fingerprints of the Abs from the “normal” class. Note that the graphs only display the reconstruction errors of 120 fingerprints from each testing set. c The test sets used to evaluate the DNN models below contained only Abs that do not compete with KZ52 in an attempt to detect false positives (i.e., the Ab representing the normal class was a decoy).
Fig 10
Fig 10. LIME analysis evaluating the reliability of predictions from a trained DNN model.
a The column lists images of arbitrary fingerprints associated with the Abs listed under the “Abs Set” column. b The column contains images generated as the superposition of three elements, a) green color represent the most relevant pixels used by the DNN to generate the prediction, i.e., those shown in the image from column 3; b) bright red pixels have the most negative contribution to the prediction; and c) the remaining pixels from the original fingerprint image. c This column contains heatmap images describing the contribution of each pixels on the fingerprint to the prediction generated by the DNN model. The color scale ranges from dark blue for the most relevant contributions to dark red for the most negative ones. The color scale is selected independently for each heatmap based on the scores assigned by LIME.
Fig 11
Fig 11. LIME analysis evaluating the reliability of predictions of a DNN model trained for classification of HIV Abs based on their binding preference.
a The column lists images of arbitrary fingerprints associated with HIV Abs binding to SITE1 and SITE2 as defined in Fig 7. b Images in this column show to the most relevant pixels from the analyzed fingerprint used by the DNN model to generate the associations with the correct Ab. c The column contains images generated as the superposition of three elements, a) green color represent the most relevant pixels used by the DNN to generate the prediction, i.e., those shown in the image from column 3; b) bright red pixels have the most negative contribution to the prediction; and c) the remaining pixels from the original fingerprint image. d This column contains heatmap images describing the contribution of each pixels on the fingerprint to the prediction generated by the DNN model. The color scale ranges from dark blue for the most relevant contributions to dark red for the most negative ones. The color scale is selected independently for each heatmap based on the scores assigned by LIME.
Fig 12
Fig 12. Training of DNN for recognition of Abs from ten lineages.
The validation accuracy (Aval) is a metric associated with the quality of the DNN model that measures the accuracy on the validation set and is potentially prone to overfitting. The green horizontal line at κ equal 0.4 divides the set of predictions on independent test sets of fingerprints into significant (≥ 0.4) and no-significant (< 0.4).
Fig 13
Fig 13. Using multiple Ab models to account for the CDRs flexibility and variations of side-chain orientation.
(A) Superposition of 3-D models of seven EBOV Abs (ADI-15974, ADI-15756, ADI-15758, ADI-15999, ADI-15820, ADI-15848, ADI-16061) that target the stalk region of EBOV GP. For simplicity, the Abs are represented using grey ribbon models with positively- and negatively-charged residues associated with the CDRs shown with a ‘stick’ representation in blue and red, respectively. The light gray fragments of the ribbon models highlighted the positions of the light-chain CDR3s (CDR3-L), and heavy-chain CDR3s (CDR3-H). (B) Superposition of ten 3-D models of Ab ADI-15974 shown in the same orientation as those in (A) and using the same color scheme. Variations in the PDB templates used by Rosetta Antibody for 3-D models generation can lead to differences in the CDRs, and variations in the fingerprint patterns. In addition, for one of the models, we display the remaining positively- and negatively-charged residues of the Ab using cyan and orange colors, respectively. Note that projections of the latter set of residues may also contribute to the fingerprint patterns. (C) Same models as in panel B viewed using a 90° rotation around the horizontal axis.
Fig 14
Fig 14. Schematic diagram of the allocation of Ab fingerprints into training, validation, and testing sets for one-class classification.
See text for an explanation of antibody and fingerprints assignment. Note: the Abs labels have been simplified where “A” stands for “ADI-”.

Similar articles

Cited by

References

    1. Dati F, Schumann G, Thomas L, Aguzzi F, Baudner S, Bienvenu J, et al.. Consensus of a group of professional societies and diagnostic companies on guidelines for interim reference ranges for 14 proteins in serum based on the standardization against the IFCC/BCR/CAP Reference Material (CRM 470). International Federation of Clinical Chemistry, Community Bureau of Reference of the Commission of the European Communities, College of American Pathologists. Eur J Clin Chem Clin Biochem 1996; 34(6):517–20. ed2020. - PubMed
    1. DeKosky BJ, Kojima T, Rodin A, Charab W, Ippolito GC, Ellington AD, et al.. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat Med. 2015;21(1):86–91. 10.1038/nm.3743 - DOI - PubMed
    1. Briggs AW, Goldfless SJ, Timberlake S, Belmont BJ, Clouser CR, Koppstein D, et al.. Tumor-infiltrating immune repertoires captured by single-cell barcoding in emulsion. bioRxiv [Preprint]. 2017. bioRxiv 13841. Available from: 10.1101/134841. - DOI
    1. Waterboer T, Sehr P, Michael KM, Franceschi S, Nieland JD, Joos TO, et al.. Multiplex human papillomavirus serology based on in situ-purified glutathione s-transferase fusion proteins. Clin Chem. 2005;51(10):1845–53. 10.1373/clinchem.2005.052381 - DOI - PubMed
    1. Kamath K, Reifert J, Johnston T, Gable C, Pantazes RJ, Rivera HN, et al.. Antibody epitope repertoire analysis enables rapid antigen discovery and multiplex serology. Sci Rep. 2020;10(1):5294. Published 2020 Mar 24. 10.1038/s41598-020-62256-9 - DOI - PMC - PubMed

Publication types

LinkOut - more resources