Design of cross-reactive antigens with machine learning and high-throughput experimental evaluation

Chelsy Chesterman^#¹, Thomas Desautels^#², Luz-Jeannette Sierra^#¹, Kathryn T Arrildt^#¹, Adam Zemla², Edmond Y Lau², Shivshankar Sundaram², Jason Laliberte¹, Lynn Chen¹, Aaron Ruby¹, Mark Mednikov¹, Sylvie Bertholet¹, Dong Yu¹, Kate Luisi¹, Enrico Malito¹, Corey P Mallett¹, Matthew J Bottomley¹, Robert A van den Berg¹, Daniel Faissol²

Affiliations

¹ GSK, Rockville, MD, United States.
² Lawrence Livermore National Laboratory, Livermore, CA, United States.

^# Contributed equally.

PMID: 40761757
PMCID: PMC12319226
DOI: 10.3389/fbinf.2025.1580967

Design of cross-reactive antigens with machine learning and high-throughput experimental evaluation

Chelsy Chesterman et al. Front Bioinform. 2025.

. 2025 Jul 16:5:1580967.

doi: 10.3389/fbinf.2025.1580967. eCollection 2025.

Authors

Affiliations

¹ GSK, Rockville, MD, United States.
² Lawrence Livermore National Laboratory, Livermore, CA, United States.

^# Contributed equally.

PMID: 40761757
PMCID: PMC12319226
DOI: 10.3389/fbinf.2025.1580967

Abstract

Selecting an optimal antigen is a crucial step in vaccine development, significantly influencing both the vaccine's effectiveness and the breadth of protection it provides. High antigen sequence variability, as seen in pathogens like rhinovirus, HIV, influenza virus, complicates the design of a single cross-protective antigen. Consequently, vaccination with a single antigen molecule often confers protection against only a single variant. In this study, machine learning methods were applied to the design of factor H binding protein (fHbp), an antigen from the bacterial pathogen Neisseria meningitidis. The vast number of potential antigen mutants presents a significant challenge for improving fHbp antigenicity. Moreover, limited data on antigen-antibody binding in public databases constrains the training of machine learning models. To address these challenges, we used computational models to predict fHbp properties and machine learning was applied to select both the most promising and informative mutants using a Gaussian process (GP) model. These mutants were experimentally evaluated to both confirm promising leads and refine the machine learning model for future iterations. In our current model, mutants were designed that enabled the transfer of fHbp v1.1 specific conformational epitopes onto fHbp v3.28, while maintaining binding to overlapping cross-reactive epitopes. The top mutant identified underwent biophysical and x-ray crystallographic characterization to confirm that the overall structure of fHbp was maintained throughout this epitope engineering experiment. The integrated strategy presented here could form the basis of a next-generation, iterative antigen design platform, potentially accelerating the development of new broadly protective vaccines.

Keywords: AI; ML; Neisseria meningitidis; antibody; antigen; protein engineering; protein structure; vaccine.

PubMed Disclaimer

Conflict of interest statement

Authors CC, L-JS, KA, JL, LC, AR, MM, SB, DY, KL, EM CM, MB, RB were employed by GSK and may have received GSK shares as part of their renumeration package. Several authors are listed as inventors on patents owned by GSK. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Illustration of the mAb binding locations on the surface of fHbp. Aligned overlays of independent crystal structures, each containing a single Fab antibody fragment bound to fHbp v1.1 or fHbp v3.28; PDB: 2YPV, 5T5F, 5O14, 6H2Y, and 6XZW (Riahi et al., 2021; Desautels et al., 2024; Bianchi et al., 2019; Lopez-Sagaseta et al., 2018; Malito et al., 2013). **(A)** Bound Fab structures of the fHbp v1.1 specific mAbs are depicted in green and blue. **(B)** Bound Fab structures of the cross-reactive mAbs 4B3, 1A12, and 1E6 are depicted in orange, purple, and red respectively. Residues in the epitopes of fHbp v1.1-specific mAbs JAR5 (blue) and 12C1 (green) are indicated on the fHbp surface.

**FIGURE 2**
Round 1 summary of computational and experimental outcomes. **(A)** Interface ΔΔG calculated with FoldX for binding of fHbp v3.28-derived sequences to mAbs JAR5 and 12C1. ΔΔG values for binding to 1A12, 4B3, and 1E6 were also calculated (Supplementary Table S1). Of these sequences, and including v3.28, 41 were selected for production and are shown as red dots; sequences not selected for laboratory evaluation are shown as black dots. **(B)** For the selected sequences, experimentally validated binding is indicated with horizontal (pink, JAR5) or vertical (purple, 12C1) bars. Sequence m00006, which was selected as the starting sequence for Round 2, is at (−1.53, −2.11) in both panels.

**FIGURE 3**
Overview of the computational strategy for selection of diverse candidates with the GP model. All candidates within the defined search space are computationally modelled and multiple binding parameters are predicted using established biophysical tools, including STATIUM and FoldX. This data is combined in a machine learning model and the selection of candidates is driven by the training data provided. This allows the selection of a small set of candidates for experimental evaluation.

**FIGURE 4**
Sequences selected by the Gaussian process model for experimental evaluation. **(A)** Scatter plot of FoldX interface ΔΔG values (AnalyseComplex) for all 2,186 mutants evaluated computationally. The 108 mutants selected using the machine learning model and the decision rule are highlighted in red. Lines superimposed on the figure show effective selection thresholds in FoldX values corresponding to alternative sets of 108 mutants. If the 108 sequences with the best FoldX-predicted ΔΔG in binding 1A12 had been selected, points below the blue line fall in this set. Similarly, for if FoldX predicted binding of 12C1 is the only criterion, points left of the purple line are the resulting set, and for the equal sum of the two ΔΔGs for 1A12 and 12C1, points below and left of the orange line would constitute the resulting set. **(B)** The selected 108 points were tested in BLI. Among these, the best binders for 12C1 and 1A12 are marked, using cyan vertical and blue horizontal bars, respectively. A number of selected, strongly binding sequences do not lie in any of the sets constructed *post hoc* on the basis of FoldX binding predictions. **(C)** For each of group of sequences, sequence logos showing positions 221-223 and 249-252 are framed by the corresponding color. The sequence logo framed in black is all 2,187 mutants, including the parental sequence. **(D)** Pairwise distances among the 108 sequences in each selected or comparison set, using Blosum62. The black set is a size-matched, randomly selected set of 108 sequences from all 2,186 mutants considered, giving a fair comparison of intra-set sequence distances. Boxes show first quartile, median, and third quartile, while whiskers are 1.5 times the interquartile range or to the most extreme datum, whichever is narrower. Using the GP model, the decision rule traded off predicted binding performance for greater diversity in the selected set.

**FIGURE 5**
Alignment of the fHbp v3.28 mutant sequences with the highest affinity for mAb 12C1 and 1A12. The starting sequence selected for the machine learning design test, fHbp m000006, is shown in bold and deviations from this sequence are shown in green font. Residues identical in all sequences are in light grey. Locations allowed to mutate are highlighted.

**FIGURE 6**
Crystal structure of fHbp m002416. **(A)** Crystal structure of fHbp m002416 (blue) crystallized with JAR5 (black) PDB: 2YPV. **(B)** Crystal structure of fHbp m002416 bound to JAR5 (black) with fHbp m002416 colored by b-factors. **(C)** Crystal structure of fHbp v1.1 in complex with JAR5 (black) in space group C2221, fHbp v1.1 colored by b-factor. **(D)** Crystal structure of fHbp v1.1 in complex with JAR5 (black) in space group I4122, fHbp colored by b-factor. B-factor scale from high (red) to low (blue) and standardized across panels (B–D).

See this image and copyright information in PMC

References

1. Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., et al. (2015). GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25. 10.1016/j.softx.2015.06.001 - DOI
1. Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630 (8016), 493–500. 10.1038/s41586-024-07487-w - DOI - PMC - PubMed
1. Adams P. D., Afonine P. V., Bunkoczi G., Chen V. B., Davis I. W., Echols N., et al. (2010). PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D. Biol. Crystallogr. 66 (Pt 2), 213–221. 10.1107/s0907444909052925 - DOI - PMC - PubMed
1. Akbar R., Robert P. A., Pavlovic M., Jeliazkov J. R., Snapkov I., Slabodkin A., et al. (2021). A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Rep. 34 (11), 108856. 10.1016/j.celrep.2021.108856 - DOI - PubMed
1. Bachas S., Rakocevic G., Spencer D., Sastry A. V., Haile R., Sutton J. M., et al. (2022). Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness. bioRxiv. 10.1101/2022.08.16.504181 - DOI

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Design of cross-reactive antigens with machine learning and high-throughput experimental evaluation

Affiliations

Design of cross-reactive antigens with machine learning and high-throughput experimental evaluation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources