Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 28:10:1249247.
doi: 10.3389/fmolb.2023.1249247. eCollection 2023.

AI/ML combined with next-generation sequencing of VHH immune repertoires enables the rapid identification of de novo humanized and sequence-optimized single domain antibodies: a prospective case study

Affiliations

AI/ML combined with next-generation sequencing of VHH immune repertoires enables the rapid identification of de novo humanized and sequence-optimized single domain antibodies: a prospective case study

Paul Arras et al. Front Mol Biosci. .

Abstract

Introduction: In this study, we demonstrate the feasibility of yeast surface display (YSD) and nextgeneration sequencing (NGS) in combination with artificial intelligence and machine learning methods (AI/ML) for the identification of de novo humanized single domain antibodies (sdAbs) with favorable early developability profiles. Methods: The display library was derived from a novel approach, in which VHH-based CDR3 regions obtained from a llama (Lama glama), immunized against NKp46, were grafted onto a humanized VHH backbone library that was diversified in CDR1 and CDR2. Following NGS analysis of sequence pools from two rounds of fluorescence-activated cell sorting we focused on four sequence clusters based on NGS frequency and enrichment analysis as well as in silico developability assessment. For each cluster, long short-term memory (LSTM) based deep generative models were trained and used for the in silico sampling of new sequences. Sequences were subjected to sequence- and structure-based in silico developability assessment to select a set of less than 10 sequences per cluster for production. Results: As demonstrated by binding kinetics and early developability assessment, this procedure represents a general strategy for the rapid and efficient design of potent and automatically humanized sdAb hits from screening selections with favorable early developability profiles.

Keywords: artificial intelligence and machine learning (ML); deep learning; in silico developability; long short-term memory (LSTM); next-generation sequencing (NGS); protein engineering; single domain antibodies (VHH); yeast surface display (YSD).

PubMed Disclaimer

Conflict of interest statement

PA, HY, LP, LF, VS, AD, SK, EG, SZ, AE were employed by Merck Healthcare KGaA. CS, JS, JT were employed by Merck KGaA. TC was employed by EMD Serono.

Figures

FIGURE 1
FIGURE 1
The end-to-end process consists of the following steps: (A). Library construction process. VHH-derived CDR3 regions obtained from a llama, immunized against (rh) NKp46 are grafted onto a generic humanized and sequence-optimized VHH backbone library. (B). Process of binder identification from Yeast Display Library based on multiple rounds of FACS and next-generation sequencing (NGS) analysis of sequence pools before and after FACS, followed by sequence clustering, per-cluster frequency and enrichment analyses in combination with in silico developability predictions to identify most interesting sequence clusters. (C). Per-cluster LSTM deep generative model generation and sampling of new sequences that are subjected to in silico developability assessment to identify sequences for synthesis and experimental profiling. (D). Selected VHH sequences are produced as one-armed monovalent SEEDbodies and experimentally characterized for binding against NKp46 and in early developability assays. (Figures partially created with BioRender.com).
FIGURE 2
FIGURE 2
Per-residue enrichment and frequency analysis, both illustrated as heat-map for CDR3 sequence cluster 3. The table headers show the CDR1-3 sequence of the most frequent clone observed in the NGS data set after the second round of FACS selection within this cluster. (A). Per-residue enrichment ratio over YSD-FACS rounds 1–2. Residues with a high enrichment (colored green) are observed with a higher relative frequency after FACS round 2 compared to round 1. (B). Per-residue frequency distribution observed after FACS round 2.
FIGURE 3
FIGURE 3
Graphical visualization of in silico properties for VHH domains that were selected for synthesis and experimental profiling. Blue bars indicate sequences obtained from NGS, red bars indicate sequences obtained from AI/ML (LSTM) sampling.
FIGURE 4
FIGURE 4
Bio-Layer Interferometry (BLI) curves (in black) and fitting curves (in red) obtained for all sequences.
FIGURE 5
FIGURE 5
Graphical visualization of experimental analytical and early developability data for selected one-armed VHH SEEDbodies and antibody controls, including amount of protein, SEC Purity, mean Tonset, HIC retention time, AC-SINS and polyspecificity (PSR-BLI). Blue bars indicate sequences obtained from NGS, red bars indicate sequences obtained from AI/ML (LSTM) sampling.
FIGURE 6
FIGURE 6
Comparison of predicted aggregation propensities vs. experimental HIC retention times and Pearson correlation values. Sequences from different clusters are shown in different colors. (A). Predicted aggregation propensities based on the entire variable VHH regions. (B). Predicted aggregation propensities based on the CDR regions only.
FIGURE 7
FIGURE 7
Similarity of CDR1-3 sequences within the best 100 scoring sequences (based on their NLL) for each CDR3 sequence cluster (A–D), illustrated using UMAP dimensionality reduction. Blue dots represent sequences that were obtained from NGS, red dots represent new sequence combinations that were automatically designed with LSTM.

References

    1. Ahmed L., Gupta P., Martin K. P., Scheer J. M., Nixon A. E., Kumar S. (2021). Intrinsic physicochemical profile of marketed antibody-based biotherapeutics. Proc. Natl. Acad. Sci. 118 (37), e2020577118. 10.1073/pnas.2020577118 - DOI - PMC - PubMed
    1. Akbar R., Bashour H., Rawat P., Robert P. A., Smorodina E., Cotet T. S., et al. (2022). Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. mAbs 14 (1), 2008790. 10.1080/19420862.2021.2008790 - DOI - PMC - PubMed
    1. Antibody Discovery Software (2023). Geneious biologics antibody discovery software. Available from: https://www.geneious.com/biopharma/ .
    1. Arras P., Yoo H. B., Pekar L., Schröter C., Clarke T., Krah S., et al. (2023). A library approach for the de novo high-throughput isolation of humanized VHH domains with favorable developability properties following camelid immunization. mAbs in press. 10.1080/19420862.2023.2261149 - DOI - PMC - PubMed
    1. Baker D., Sali A. (2001). Protein structure prediction and structural genomics. Science 294 (5540), 93–96. 10.1126/science.1065659 - DOI - PubMed