Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 10:23:3759-3770.
doi: 10.1016/j.csbj.2024.10.005. eCollection 2024 Dec.

Determining key residues of engineered scFv antibody variants with improved MMP-9 binding using deep sequencing and machine learning

Affiliations

Determining key residues of engineered scFv antibody variants with improved MMP-9 binding using deep sequencing and machine learning

Masoud Kalantar et al. Comput Struct Biotechnol J. .

Abstract

Given the crucial role of specific matrix metalloproteinases (MMPs) in the extracellular matrix, an imbalance in the regulation of activation of matrix metalloproteinase-9 (MMP-9) zymogen and inhibition of the enzyme can result in various diseases, such as cancer, neurodegenerative, and gynecological diseases. Thus, developing novel therapeutics that target MMP-9 with single-chain antibody fragments (scFvs) is a promising approach. We used fluorescent-activated cell sorting (FACS) to screen a synthetic scFv antibody library displayed on yeast for enhanced binding to MMP-9. The screened scFv mutants demonstrated improved binding to MMP-9 compared to the natural inhibitor of MMPs, tissue inhibitor of metalloproteinases (TIMPs). To identify the molecular determinants of these engineered scFv variants that affect binding to MMP-9, we used next-generation DNA sequencing and computational protein structure analysis. Additionally, a deep-learning language model was trained on the screened scFv library of variants to predict the binding affinities of scFv variants based on their CDR-H3 sequences.

Keywords: Antibody engineering; MMP-9; Machine learning; Metalloproteinase; Protein complex structural modeling; Single-chain antibody fragment; Yeast surface display.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest.

Figures

ga1
Graphical abstract
Fig. 1
Fig. 1
The general approach for protein engineering and design of antibody scFv variants using directed evolution, yeast surface display to target MMP-9. 1) Library generation: a library of scFv variants with mutations in the CDR-H3 region to introduce diversity in both amino acid composition and length was used. These scFv variants were then electrotransformed into a yeast strain for display, labeling, and screening. 2) Expression and display: yeast cells carrying expression plasmid vectors encoding different scFvs were grown and induced for display of the scFvs variants genetically fused to the C-terminus of Aga2p on the yeast surface. Cells expressing scFv variants were incubated with MMP-9cd enzyme and further with proper fluorescent conjugated ligands that label scFv variants for further binding analysis. 3) Screening and sequencing: the scFv library of mutants was screened for binding to MMP-9cd into binder (positive) and non-binder (negative) populations using FACS. The DNA isolated and amplified from the sorted scFv libraries were sequenced via Sanger sequencing and/or next-generation sequencing. 4) Data analysis and model training: The sequencing data extracted from NGS was used as an input for a machine learning model determining key residues of CDR-H3 on MMP-9cd binding. This data was used to train and validate a protein language model to predict the binding affinity of specific CDR-H3 regions to MMP-9cd.
Fig. 2
Fig. 2
FACS sorting of yeast cells displaying scFvs. A) Yeast cells displaying scFvs were directly incubated with fluorescent-conjugated c-myc antibody and the catalytic domain of MMP-9 (MMP-9cd) with 6xHis-tag, and then labeled with fluorescent-conjugated anti c-myc and anti-6xHis antibodies for quantitative analysis of expression and binding, respectively. B) Flow cytometry dual scatter plots of the naive scFv library and the screened scFv library after two rounds of FACS sorting toward MMP-9cd. The diagonal gate (P1) defines the enriched population of yeast cells displaying scFvs with high MMP-9cd binding affinity.
Fig. 3
Fig. 3
Next-generation sequencing of binders and non-binders to MMP-9. A) The bar graph displays the frequency of amino acids in the CDR-H3 loop of the final scFv library, normalized to the naive library. Each bar's height indicates the relative enrichment or depletion of a specific amino acid in the final library. B) This bar graph represents the proportion of various amino acid types (Positive: R, K, H, Negative: D, E, Hydrophobic: A, I, L, M, F, W, V, Neutral: G, C, P, and Polar: N, Q, S, T, Y) at each position within the CDR-H3 region of the scFv variants with 11 aa lengths. Each bar corresponds to a specific position in the CDR-H3 region, showing the relative frequency of each amino acid type. The height of each colored segment within a bar indicates the prevalence of that particular amino acid type at the given position. C) The sequence logo illustrates the amino acid frequencies at each position within the CDR-H3 loop of the final scFv library with 11 aa residues. The height of each letter is proportional to its frequency at a given position, with taller letters indicating higher frequency. D) The distribution of CDR-H3 lengths among positive and negative MMP-9cd binders.
Fig. 4
Fig. 4
A computational model for predicting binding affinity of CDR-H3 to MMP-9. The pre-trained LPLMs were used to extract the vector representations (embeddings) for the CDR-H3s, which were padded to achieve a uniform length. These embeddings were used to train a downstream LSTM model with LSTM, dense, and binary classification layers. The REGA-3G12 CDR-H3 (AVIIYGSSWRY) with a length of 11 amino acids was passed to the pre-trained LPLM ESM-2 650MB. This model produced 11 × 1280 vector representations, with 1280 vectors corresponding to each amino acid. To achieve a uniform length of 17 × 1280 (where 17 is the maximum length of CDR-H3 in the training dataset), the embeddings were zero-padded. These zero-padded embeddings were input into the downstream LSTM model, which had been previously trained to predict CDR-H3 binding affinity to MMP-9. The model predicted a 99 % probability of binding to MMP-9cd for the REGA-3G12 CDR-H3 (AVIIYGSSWRY).
Fig. 5
Fig. 5
Residue-Position Mapping. A) The Shapley plot illustrates the final prediction of the machine learning model. Red arrows indicate specific amino acids that positively contribute to the binding prediction, while blue arrows represent amino acid residues that negatively participate in the binding prediction. B) The heatmap represents the impact of various amino acid residues at specific positions on binding affinity. Higher Shapley values (warmer colors) show positive contributions to MMP-9 binding, highlighting crucial interactions between residues and positions. Conversely, lower Shapley values (cooler colors) indicate negative contributions.
Fig. 6
Fig. 6
scFv variants isolated after two sequential FACS screening for MMP-9cd binding. A) The bar graph shows the mean fluorescence intensity for 6xHis-MMP-9cd binding to scFv variants, adjusted for background and normalized to N-TIMP-1 which was used as a positive control for 6xHis-MMP-9cd binding. Yeast-displayed scFv variants were incubated with 300 nM soluble 6xHis-MMP-9cd protein in all experiments. Each data point represents the mean of triplicate samples, with error bars and the standard error of the mean (SEM) displayed for each data point. B) Flow cytometry scatter plots illustrate several isolated yeast-displayed scFv variants with enhanced MMP-9cd binding activity, using N-TIMP-1 as a reference. The x-axis (APC channel) represents binding to 6xHis-MMP-9cd (300 nM), while the y-axis (FITC channel) shows scFv expression levels.
Fig. 7
Fig. 7
Surface charge distribution of CDR-H3 on different scFvs and MMP-9cd. The surface charge is visually represented by color, with red indicating negatively charged regions, blue showing positively charged areas, and white denoting neutral regions. The 90° rotation offers a detailed view of the charge distribution, revealing the three-dimensional electrostatic landscape and geometry of each protein's CDR-H3. The active site of MMP-9cd displays a prominent negatively charged area, suggesting a strong potential for electrostatic interactions with positively charged CDR-H3 sequences. As a positive control, REGA-3G12's CDR-H3 domain shows a neutral to positive charge distribution, which is well-suited for electrostatic interactions with MMP-9cd.
Fig. 8
Fig. 8
Binding interactions between CDR-H3 variants and MMP-9cd. The structure of MMP-9cd (in grey) complexed with the scFv antibody variants is depicted, with the light chain (VL) shown in dark blue, the heavy chain (VH) in cyan, and CDR-H3 in dark pink. (A) The REGA-3G12 variant illustrates an interaction at position 6 of CDR-H3, where Gly104 forms a hydrogen bond with residues of MMP-9cd. (B) The SynAb-MK2 variant features Gly231 at position 6 of CDR-H3 forming hydrogen bonds with Ile92 and Pro90, located at the exosite of MMP-9cd. (C) The SynAb-pK16 variant with residue Asp233 at position 6 of CDR-H3 highlighted, which plays a crucial role in binding to the exosite of MMP-9cd, with additional interactions involving positions 4 (Ala231) and 5 (Lys232).

Update of

Similar articles

Cited by

References

    1. Raeeszadeh-Sarmazdeh M., Do L.D., Hritz B.G. Metalloproteinases and their inhibitors: potential for the development of new therapeutics. Cells. 2020;9 - PMC - PubMed
    1. Radisky E.S. Extracellular proteolysis in cancer: proteases, substrates, and mechanisms in tumor progression and metastasis. J Biol Chem. 2024;300(6) doi: 10.1016/J.JBC.2024.107347. - DOI - PMC - PubMed
    1. Radisky E.S., Raeeszadeh-Sarmazdeh M., Radisky D.C. Therapeutic potential of matrix metalloproteinase inhibition in breast cancer. J Cell Biochem. 2017;118:3531–3548. - PMC - PubMed
    1. Kalantar M., Hilpert G.A., Mosca E.R., Raeeszadeh-Sarmazdeh M. Engineering metalloproteinase inhibitors: tissue inhibitors of metalloproteinases or antibodies, that is the question. Curr Opin Biotechnol. 2024;86 doi: 10.1016/J.COPBIO.2024.103094. - DOI - PubMed
    1. Razai A.S., Eckelman B.P., Salvesen G.S. Selective inhibition of matrix metalloproteinase 10 (MMP10) with a single-domain antibody. J Biol Chem. 2020;295:2464–2472. - PMC - PubMed

LinkOut - more resources