Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May;617(7959):176-184.
doi: 10.1038/s41586-023-05993-x. Epub 2023 Apr 26.

De novo design of protein interactions with learned surface fingerprints

Affiliations

De novo design of protein interactions with learned surface fingerprints

Pablo Gainza et al. Nature. 2023 May.

Abstract

Physical interactions between proteins are essential for most biological processes governing life1. However, the molecular determinants of such interactions have been challenging to understand, even as genomic, proteomic and structural data increase. This knowledge gap has been a major obstacle for the comprehensive understanding of cellular protein-protein interaction networks and for the de novo design of protein binders that are crucial for synthetic biology and translational applications2-9. Here we use a geometric deep-learning framework operating on protein surfaces that generates fingerprints to describe geometric and chemical features that are critical to drive protein-protein interactions10. We hypothesized that these fingerprints capture the key aspects of molecular recognition that represent a new paradigm in the computational design of novel protein interactions. As a proof of principle, we computationally designed several de novo protein binders to engage four protein targets: SARS-CoV-2 spike, PD-1, PD-L1 and CTLA-4. Several designs were experimentally optimized, whereas others were generated purely in silico, reaching nanomolar affinity with structural and mutational characterization showing highly accurate predictions. Overall, our surface-centric approach captures the physical and chemical determinants of molecular recognition, enabling an approach for the de novo design of protein interactions and, more broadly, of artificial proteins with function.

PubMed Disclaimer

Conflict of interest statement

Ecole Polytechnique Fédérale de Lausanne (EPFL) has filed a provisional patent application that incorporates findings presented in this Article. P.G., S.W., A.V.H.-B., A.M., A.S., Z.H., F.S., M.B. and B.E.C. are named as co-inventors on this patent (European Patent Office, EP22177692.5).

Figures

Fig. 1
Fig. 1. Surface-centric design of de novo site-specific protein binders.
a, Schematic of fingerprint generation. Protein binding sites are spatially embedded as vector fingerprints. Protein surfaces are decomposed into overlapping radial patches, and a neural network trained on native interacting protein pairs learns to embed the fingerprints such that complementary fingerprints are placed in a similar region of space. We show an illustration for a subsample of the fingerprints projected in a space reduced to three dimensions. The green box highlights a region of complementary fingerprints. b, MaSIF-seed—a method to identify new binding seeds. A target patch is identified by MaSIF-site based on the propensity to form buried interfaces. Using MaSIF-seed, fingerprint complementarity is evaluated between the target patch and all fingerprints in a large database (around 402 million patches); the pairs of fingerprints are subsequently ranked. The top patches are aligned and rescored to enable a more precise evaluation of the seed candidates. c, Scaffold search, seed grafting and interface redesign. The selected seeds are transferred to protein scaffolds and the rest of the interface is redesigned using Rosetta. The top designs are selected and tested experimentally.
Fig. 2
Fig. 2. Design and optimization of a SARS-CoV-2 binder targeting the RBD.
a, MaSIF-site prediction of the interface propensity of the RBD. The ACE2-binding footprint (yellow outline) is distinct from the predicted binding site (red). b, MaSIF-seed predicts helical seeds that cluster into anti-parallel orientations, referred to as up or down configurations. Sequence logo plots highlight the similarity between the sequences of the two seed clusters, regardless of orientation. c, The scaffold (PDB: 5VNY) used to make DBR3_01 allows for binding in the up or down orientation, sharing similar footprints. d, SPR data of improved DBR3 binders with controls. DBR3_03 has an affinity of 80 nM with RBD. e, A cryo-EM structure (dark green) aligns to the AlphaFold prediction with an iRMSD of 1.4 Å. The trimeric spike protein (grey) has one DBR3_03 bound per RBD (orange, pink, green). f, Fc–DBR3_03 binds to the spike protein of most variants of concern, except for those with the L452R mutation. A list of half-maximal effective concentration (EC50) values of DBR3_03 is provided in Supplementary Table 3. The fits were calculated from technical replicates (n = 2) using a nonlinear four-parameter curve fitting analysis. g, Fc–DBR3_03 neutralizes live Omicron virus in cell-based inhibition assays with an half-maximal inhibitory concentration (IC50) of 1.7 × 10−6 g ml−1, compared with the AstraZeneca (AZD8895 and AZD1061) mix, which has an IC50 of 2.9 × 10−7 g ml−1. The fits were calculated from biological replicates (n = 2) using a nonlinear four-parameter curve fitting analysis.
Fig. 3
Fig. 3. De novo design and optimization of PD-L1 binders targeting a flat surface.
a, MaSIF-site prediction of the interface propensity of PD-L1. The predicted interface (red) overlaps with the binding site of the native interaction partner PD-1 (yellow). b, Helical seeds were predicted by MaSIF-seed and clustered. The dominant cluster showed strong amino acid preferences (Z-score > 2). Hotspot residues are underlined. c, Binders based on two different scaffold proteins using the selected seed were identified. d, The binding affinities of DBL1 designs after combinatorial (light green) and SSM library optimization (dark green), measured using SPR. Mutation of a hotspot residue (V12R) ablates binding of DBL1_03 (wheat). e, The binding affinities of DBL2 designs after combinatorial (light blue) and SSM library optimization (dark blue), measured using SPR. Mutation of a hotspot residue (V12R) knocks out binding of DBL2_02 (wheat). f, SSM analysis of regions of interest in the binding interface of DBL1_03. The original residue of DBL1_03 is indicated by a cross and hotspot residue positions are shown in black boxes. Enrichment in the binding population (blue) and in the non-binding population (red) is indicated. g, SSM data in the binding interface of DBL2_03. The original residue of DBL2_02 is indicated by a cross. h, The binding mode of the selected seed in comparison to the native interaction partner PD-1. i, Crystal (xtal) structure of DBL1_03 in a complex with PD-L1. The computational model (light green) is aligned with the crystal structure (dark green). Inset: the alignment of the residues in the binding seed. j, Crystal structure of DBL2_02 in a complex with PD-L1, shown by aligning the computational model (light blue) with the crystal structure (dark blue). Inset: the alignment of the residues in the binding seed represented as sticks.
Fig. 4
Fig. 4. Optimized workflow and de novo binders for PD-1.
a, Improved design computational workflow in which two steps of design are used, at the seed and at the scaffold level, with an emphasis on building new hydrogen bond networks. b, PD-1 (blue) targeted by DBP13_01 (green); hotspot residues from the binding seed (red) are highlighted. Insets: crucial residues for binding. c, Histogram of the binding signal (PE, phycoerythrin) measured by flow cytometry for DBP13_01, the native miniprotein scaffold, two variants of DBP13_01 with crucial residues mutated and a negative control with unlabelled yeast. The dashed line indicates the geometric mean of the DBP13_01 binding signal. d, Binding affinities determined by SPR of the nivolumab Fab (green squares), DBP13_01 (red diamonds) and DBP13_02 (blue triangles). The dissociation constant of DBP13_01 was obtained with three independent measurements. e, SSM heat map showing interface residues and the enrichment of each point mutation. The original amino acids in DBP13_01 are indicated by a cross. Enrichment in the binding population (blue) and in the non-binding population (red) is indicated. Hotspot residues are highlighted with a black box. f, PD-L1 (orange) targeted by DBL3_01 (purple). Insets: magnification of interface residues, including one crucial residue tested for knockout mutants (Ile43, red). g, The binding signal measured using flow cytometry for DBL3_01, DBL3_02, the native protein scaffold, one knockout mutant and a negative control with unlabelled yeast. h, PD-L1 ligand titration on yeast displaying DBL3_02 (orange triangle) or high-affinity PD-1 (HA–PD-1, purple diamonds).
Extended Data Fig. 1
Extended Data Fig. 1. Overview of the neural network architectures used in the MaSIF protocols.
a, General MaSIF framework. Molecular surfaces are decomposed into patches which are annotated with chemical and shape features. The MaSIF network translates these input features into fingerprints that describe the original surface patch. b, MaSIF-site neural network. MaSIF-site predicts partner-independent protein interface propensities based on per-vertex chemical and shape features of the protein surface. c, MaSIF-search neural network. MaSIF-search embeds protein patches into a space where complementary patches are close to each other. The network was trained on discriminating interacting patches from non-interacting protein surface patches. The network uses MaSIF fingerprints to identify which are compatible and therefore to predict likely interacting proteins. d, Interface post-alignment (IPA) scoring neural network. The IPA scoring neural network enables the scoring of protein interfaces based on several input features: fingerprint distance between contacting points, 3D distance of corresponding points, normal dot product, and the distance between surface points in the seed and the closest atom in the target, which we call ‘penetration’.
Extended Data Fig. 2
Extended Data Fig. 2. MaSIF-seed benchmarking for the discrimination of helical or non-helical binding motifs.
A non-redundant set of 31 helical and 83 non-helical fragments that bind to known protein receptors was selected as a benchmark set to evaluate MaSIF-seed’s capacity to recover true binding motifs from decoys, and to correctly rank them among the top results. To generate the decoy set, a non-redundant set of all protein chains in the Protein Data Bank was decomposed into continuous helical segments (left) and two/three-stranded beta sheets (right), resulting in over 250K helical and over 380K beta motifs, respectively. One thousand of these motifs each were randomly selected to act as decoys in the respective benchmarks. The surfaces for the two sets of 1000 motifs were computed and decomposed into radial patches and for each patch a fingerprint was computed. Recovered complexes were considered correct if an iRMSD < 3 Å was obtained. A comparable procedure was applied to the benchmark tools.
Extended Data Fig. 3
Extended Data Fig. 3. RBD-binder designs displayed on yeast.
a, The yeast display protocol utilizes PE to label binding and FITC to label expression. Yeast appearing in the double positive quadrant are considered potential binders and sorted for enrichment. b, Pools of approximately 30 designs were displayed on the surface of yeast and the highest binding populations (red box) sorted for further analysis. c, Schematic of RBD (wheat) bound to the various members of the library (transparent silhouettes and purple for DBR3_01) and ACE2 (red) overlapping with the designed binders. d, Individual designs DBR1-DBR20. e, DBR3_01 design displayed on yeast binds to RBD-Fc (left panel) but the binding is blocked when the RBD-Fc is preincubated with an excess of ACE2, indicating a competitive binding mode. f, A point mutant in the binding interface (DBR3_01_RR) and the original scaffold protein (WT_scaffold) show lower binding signal than DBR3_01 with 1 μM RBD-Fc, indicating that the design is engaging the RBD with the predicted interface.
Extended Data Fig. 4
Extended Data Fig. 4. DBR3_03 binding is sensitive to the L452R mutation in the spike protein.
a, Luminex binding assay of DBR3_03 or Imdevimab (REGN10987) with beads functionalized with SARS-CoV-2 spike protein of indicated variants. DBR3_03 has an EC50 of 3.2e−8 g ml−1 with WT and 3.5e−8 g ml−1 with Omicron. Imdevimab has an EC50 of 8.2e−8 g ml−1 with WT and 1.7e−7 g ml−1 with delta. The fits were calculated from technical replicates (n = 2) using a nonlinear four parameter curve fitting analysis. b, The L452R mutation on the spike protein leads to a clash with the DBR3_03 binding. A L24G mutation is proposed to avoid the clash. c, BLI data with DBR3_03 (WT KD<0.1 nM, delta KD not detected) or DBR3_03_L24G (delta KD = 6 nM, WT KD = 6 nM) immobilized on the tips, dipped into spike protein of different variants.
Extended Data Fig. 5
Extended Data Fig. 5. Yeast libraries, SSM and binding data of DBL1/DBL2_02.
a, Position of targeted residues in the structure of DBL1_01 to improve binding affinity. Logo plot of the allowed mutations in the library and alignment of initial design with library enriched design. b, Position of targeted residues in the structure of DBL1_02 to improve core packing. Logo plot of the allowed mutations in the library and alignment of DBL1_02 with library enriched design. c, Structural representation of all positions sampled in the SSM library (light blue). The four hotspot residues (red) were also sampled. Three positions were mutated in DBL1_04 (dark blue). d, Outcome of the entire SSM library of DBL1_03. Blue indicates enrichment in the binding population, while red shows enrichment in the non-binding population. e, Binding of DBL1_03 and DBL1_04 to KARPAS299 cells expressing PD-L1 compared to binding of WT PD-1, a high affinity version of PD-1 (PD-1_HA) and a V12R mutation of DBL1_03 (KO). All proteins contained a Fc domain. f, Position of targeted residues in the structure of DBL2_01 to improve binding affinity and solubility. Logo plot of the allowed mutations in the library and alignment of initial design with library enriched design. Hotspot residues red, targeted residues light blue, mutated residues dark blue. g, Structural representation of all positions sampled in the SSM library (light blue). The four hotspot residues (red) were also sampled. Three positions were mutated in DBL2_04 (dark blue). Position 35 was not mutated in DBL_04, because all mutations in this position led to the inability of the soluble expression of the protein. h, Outcome of the entire SSM library of DBL2_03. Blue indicates enrichment in the binding population, while red shows enrichment in the non-binding population. i, Binding affinities measured by SPR for the different versions of DBL2.
Extended Data Fig. 6
Extended Data Fig. 6. Competition and specificity binding assay of the different optimized binders on the surface of yeast.
a, Competition between designed binders and a known protein binder (native binder or monoclonal Fab) in complex with the target structure. b, Flow cytometry histograms showing fluorescence signals on the surface of yeast displaying the different binders. Yeasts were labelled with 500 nM or their respective ligand (blue), 500 nM of blocked ligand pre-incubated with 10-fold molar excess of Fab or high-affinity PD-1 (HA-PD-1) (orange) or labelled with secondary antibodies only (grey, Neg Ctrl). c, Flow cytometry histograms showing fluorescence signal on the surface of yeast displaying the different binders and labelled with 500 nM of unrelated protein ligand (red) or labelled with secondary antibodies only (grey, Neg Ctrl).
Extended Data Fig. 7
Extended Data Fig. 7. DBL3_01 and DBL4_01 comparison and DBL4_01 and DBC2_01 knock-out mutants.
a, Superposition between DBL3_01 (cyan) and DBL4_01 (orange) in complex with PD-L1 (grey). Multiple sequence alignment of the two designs is shown at the bottom. b, DBL4_01 (orange) in complex with PD-L1 (grey) with knock-out mutant highlighted in red. Flow cytometry histograms showing fluorescence signals on the surface of yeast displaying DBL4_01 or the knock-out mutant, compared to unlabelled yeast (Neg Ctrl). c, DBC2_01 (green) in complex with CTLA-4 (blue) with two knock-out mutants highlighted in red. Flow cytometry histograms showing fluorescence signals on the surface of yeast displaying DBC2_01 or the knock-out mutants, compared to unlabelled yeast (Neg Ctrl).
Extended Data Fig. 8
Extended Data Fig. 8. Surface comparison between seeds, designs and final/predicted structures.
Buried interfaces of models/structures when in complex with their target are coloured in red, while non-buried regions coloured in blue. The contour of the buried interface of the initial binding seed is drawn in green and is shown for the initial seed, for the designs and for the final/predicted structures.

References

    1. Janin J, Bahadur RP, Chakrabarti P. Protein–protein interaction and quaternary structure. Q. Rev. Biophys. 2008;41:133–180. doi: 10.1017/S0033583508004708. - DOI - PubMed
    1. Cao L, et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science. 2020;370:426–431. doi: 10.1126/science.abd9909. - DOI - PMC - PubMed
    1. Sesterhenn F, et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science. 2020;368:eaay5051. doi: 10.1126/science.aay5051. - DOI - PMC - PubMed
    1. Silva D-A, et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature. 2019;565:186–191. doi: 10.1038/s41586-018-0830-7. - DOI - PMC - PubMed
    1. Marcandalli J, et al. Induction of potent neutralizing antibody responses by a designed protein nanoparticle vaccine for respiratory syncytial virus. Cell. 2019;176:1420–1431. doi: 10.1016/j.cell.2019.01.046. - DOI - PMC - PubMed

Publication types