Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 26:2024.12.20.629482.
doi: 10.1101/2024.12.20.629482.

Generation of antigen-specific paired chain antibody sequences using large language models

Affiliations

Generation of antigen-specific paired chain antibody sequences using large language models

Perry T Wasdin et al. bioRxiv. .

Update in

  • Generation of antigen-specific paired-chain antibodies using large language models.
    Wasdin PT, Johnson NV, Janke AK, Held S, Marinov TM, Jordaan G, Gillespie RA, Vandenabeele L, Pantouli F, Powers OC, Vukovich MJ, Holt CM, Kim J, Hansman G, Logue J, Chu HY, Andrews SF, Kanekiyo M, Sautto GA, Ross TM, Sheward DJ, McLellan JS, Abu-Shmais AA, Georgiev IS. Wasdin PT, et al. Cell. 2025 Nov 4:S0092-8674(25)01135-3. doi: 10.1016/j.cell.2025.10.006. Online ahead of print. Cell. 2025. PMID: 41192421

Abstract

The traditional process of antibody discovery is limited by inefficiency, high costs, and low success rates. Recent approaches employing artificial intelligence (AI) have been developed to optimize existing antibodies and generate antibody sequences in a target-agnostic manner. In this work, we present MAGE (Monoclonal Antibody GEnerator), a sequence-based Protein Language Model (PLM) fine-tuned for the task of generating paired human variable heavy and light chain antibody sequences against targets of interest. We show that MAGE can generate novel and diverse antibody sequences with experimentally validated binding specificity against SARS-CoV-2, an emerging avian influenza H5N1, and respiratory syncytial virus A (RSV-A). MAGE represents a first-in-class model capable of designing human antibodies against multiple targets with no starting template.

PubMed Disclaimer

Conflict of interest statement

I.S.G. is a cofounder of AbSeek Bio. P.T.W and I.S.G. are listed as inventors on patents filed describing the pipeline presented here for the fine-tuning of LLMs for antigen-specific antibody generation. The Georgiev laboratory has received unrelated funding from Takeda and Merck. Dr. Chu has consulted for Bill and Melinda Gates Foundation and Ellume, and has served on advisory boards for Vir, Merck and Abbvie; she has received research funding from Gates Ventures, and support and reagents from Ellume and Cepheid outside of the submitted work.

Figures

Figure 1.
Figure 1.. A general PLM was fine-tuned for antigen-specific antibody generation.
A) An Antigen-specific Antibody Database was curated, in combination with large scale LIBRA-seq datasets, in order to fine-tune the pretrained PLM for paired chain antibody generation against antigen prompts. B) Percentage of 1,000 antibodies generated against RBD that use each combination of heavy and light V genes. The top 10 most common genes are shown for each. C) Generated variable heavy (VH) and variable light (VL) sequences were aligned to the training data to find the minimum number of mutations between each generated sequence and any training sequence. D) For the most similar training sequence from the comparison in C, the distance was calculated between each region of the VH or VL sequence. The mean across all RBD generated sequences are shown with error bars representing the standard deviation.
Figure 2.
Figure 2.. Twenty antibodies were selected for experimental validation of binding to RBD.
A) Schematic of antibody selection method after generation, yielding a total of 20 antibodies for experimental validation. B) For each antibody, the Levenshtein distance for the VH or VL is shown in comparison to the training antibody with the lowest total distance (summed across VH and VL). Antibodies are grouped by selection group. C) ELISA area-under-the-curve (AUC) based on absorbance at 450 nm across a dilution series from 6.4×10–4 μg/mL to 10 μg/mL, with S309 (RBD-specific) positive control and VRC01 (HIV-1-specific) negative control antibodies. D) Relationship between the minimum VH and VLs distance from the closest training antibody sequences with points colored based on ELISA AUC. Overlapping points at VH distance = 4 and VL distance 5 are shown a single point with split coloring based on AUCs of these two antibodies (RBD-446, RBD-413). E) BLI sensorgrams for binding of high-affinity IgG antibodies to immobilized SARS-CoV-2 RBD-SD1. Data (black) were fit to a 1:2 bivalent analyte model. Curve fits are shown in red.
Figure 3.
Figure 3.. Generated RBD binding antibodies have diverse sequence characteristics.
Strong affinity binding antibodies are bolded. A) Publicness of binding antibodies based on CDRH3, VH, and both VH and VL. Clones defined as CDR3 identity > 70% and matching V genes for the specified chain. B) Sum of edits within each VH region for binding antibodies compared to closest sequence match in training data. C) Table showing sequence characteristics of RBD binding antibodies. OASis percentile represents a humanness score averaged across the heavy and light chains D) ELISA AUC for binding curve dilutions for SARS-CoV-2 WT and SARS-CoV-2 spikes for RBD binding antibodies. E) IC50 values for psuedovirus neutralization of SARS-CoV-2 variants for full spike binding antibodies. F) Full pseudovirus neutralization curves for RBD-409 against SARS-CoV-2 variants.
Figure 4.
Figure 4.. Characteristics of sequences generated against RSV and H5/TX/24 prompts.
A) Log fold changes showing increase in same antigen-specificity clones for RSV-A and H5/TX/24 prompts compared to WT RBD prompt. Calculated based on number of clones between generated and training antibodies, out of 1000 generated sequences. B) Heatmap showing percent of 1000 generated antibody encoding different variable genes for each antigen prompt. For 1000 generated antibodies against each prompt, the distribution of C) minimum VH Levenshtein distance to any training antibody, D) minimum VL Levenshtein distance to any training antibody, E) percent identity to VH germline, and F) percent identity to VL germline.
Figure 5.
Figure 5.. MAGE generates novel A/cattle/TX/2024 H5-binding antibodies.
A) Full ELISA dilution curves for designed antibodies against H5/TX/24 hemagglutinin. B) Minimum distance to training antibody sequences. Distance represents number of residues different when compared to the heavy and light chain sequences from the training match with the lowest total distance (VH + VL). C) Percent somatic hypermutation in heavy and light chain for binding antibodies, calculated across VH and VL genes. D) Edit distance by VH region to closest training sequence match based on CDRH3 identity. E) Neutralization dilution curves against H5/TX/24 hemagglutinin. F) IC50 values calculated from curves shown in panel E.
Figure 6.
Figure 6.. MAGE generates novel RSV-A binding antibodies.
A) Full ELISA dilution curves for designed antibodies against RSV-A pre-fusion. B) Minimum distance to training antibody sequences. Distance represents number of residues different when compared to the heavy and light chain sequences from the training match with the lowest total distance (VH + VL). C) Percent somatic hypermutation in heavy and light chain for binding antibodies, calculated across VH and VL genes. D) Distance by VH region to closest training sequence match from E) Antibody neutralization dilution curves against RSV-A.
Figure 7.
Figure 7.. Cryo-EM structure of Fabs RSV-2245 and RSV-3301 bound to RSV-A F.
A) Overview of 3.4 Å resolution cryo-EM structure of RSV F bound to fragments of antigen binding (Fabs) for RSV-2245 (heavy and light chains in dark and light blue, respectively) and RSV-3301 (heavy and light chains in dark and light green, respectively). RSV-A F protomers are shown in shades of pink. Zoomed-in views of the Fab-RSV F interface are shown as cartoons with select residues represented as sticks for B) RSV-2245 heavy chain, C) RSV-2245 light chain, D) RSV-3301 heavy chain, and E) RSV-3301 light chain. Hydrogen bonds are shown as dashed blue lines.

References

    1. Hie B. L. et al. , Efficient evolution of human antibodies from general protein language models. Nat Biotechnol 42, 275–283 (2024). - PMC - PubMed
    1. Desautels T. A. et al. , Computationally restoring the potency of a clinical antibody against Omicron. Nature, (2024). - PMC - PubMed
    1. Shanehsazzadeh A. et al. , In vitrovalidated antibody design against multiple therapeutic antigens using generative inverse folding. bioRxiv, 2023.2012.2008.570889 (2023).
    1. Haraldson Høie M. et al. , AntiFold: Improved antibody structure-based design using inverse folding. 2024. ( 10.48550/arXiv.2405.03370). - DOI - PMC - PubMed
    1. Hie B. L. et al. , Efficient evolution of human antibodies from general protein language models. Nature Biotechnology 42, 275–283 (2024). - PMC - PubMed

Publication types

LinkOut - more resources