Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jul 22:2023.06.26.546591.
doi: 10.1101/2023.06.26.546591.

De Novo Design of Peptide Binders to Conformationally Diverse Targets with Contrastive Language Modeling

Affiliations

De Novo Design of Peptide Binders to Conformationally Diverse Targets with Contrastive Language Modeling

Suhaas Bhat et al. bioRxiv. .

Update in

Abstract

Designing binders to target undruggable proteins presents a formidable challenge in drug discovery, requiring innovative approaches to overcome the lack of putative binding sites. Recently, generative models have been trained to design binding proteins via three-dimensional structures of target proteins, but as a result, struggle to design binders to disordered or conformationally unstable targets. In this work, we provide a generalizable algorithmic framework to design short, target-binding linear peptides, requiring only the amino acid sequence of the target protein. To do this, we propose a process to generate naturalistic peptide candidates through Gaussian perturbation of the peptidic latent space of the ESM-2 protein language model, and subsequently screen these novel linear sequences for target-selective interaction activity via a CLIP-based contrastive learning architecture. By integrating these generative and discriminative steps, we create a Peptide Prioritization via CLIP (PepPrCLIP) pipeline and validate highly-ranked, target-specific peptides experimentally, both as inhibitory peptides and as fusions to E3 ubiquitin ligase domains, demonstrating functionally potent binding and degradation of conformationally diverse protein targets in vitro. Overall, our design strategy provides a modular toolkit for designing short binding linear peptides to any target protein without the reliance on stable and ordered tertiary structure, enabling generation of programmable modulators to undruggable and disordered proteins such as transcription factors and fusion oncoproteins.

PubMed Disclaimer

Conflict of interest statement

Competing Interests P.C., K.P, and S.B. are listed as inventors for U.S. Provisional Application No. 63/344,820, entitled: “Contrastive Learning for Peptide Based Degrader Design and Uses Thereof.” P.C. is listed as an inventor for U.S. Provisional Application No. 63/032,513, entitled: “Minimal Peptide Fusions for Targeted Intracellular Degradation.” P.C. and M.P.D. are co-founders of and have financial interests in UbiquiTx, Inc. M.P.D.’s interests are reviewed and managed by Cornell University in accordance with their conflict-of-interest policies. P.C.’s interests are reviewed and managed by Duke University in accordance with their conflict-of-interest policies. S.B. is a current paid consultant for UbiquiTx, Inc, and K.P. is a former paid consultant for UbiquiTx, Inc.

Figures

Figure 1.
Figure 1.. PepPrCLIP model training and evaluation.
(A) Gaussian distributions centered around the ESM-2 embeddings of naturally-occurring peptides are sampled and then decoded back to amino acid sequences. A trained CLIP module, jointly encoding cognate peptide-protein embeddings, screens thousands of these peptides for specific binding activity to the target. (B) Evaluation metrics for final CLIP module. Binary accuracy is the accuracy of the model in predicting the correct binding pairs when given 2 protein-peptide pairs. Top-1 accuracy is the likelihood that for a given protein, the model has chosen the correct binding peptide, and Top 10% accuracy is the likelihood that this peptide is in the top 10% of peptides when ranked for CLIP score to this protein. Metric values are presented for the “strict” held-out test set described in the Results and Methods sections. (C) Known peptide-target sequence pairs, from the PDB-validated dataset, were scored via the trained CLIP module. Mismatched pairs, each containing a peptide and a non-matching protein, both from the validation dataset, were also scored via the CLIP module. +1 indicates a definitive binding pair while −1 indicates that the peptide and target sequences do not bind (see Methods). Scores are represented as a histogram. The mean (μ) and variance (σ2) are provided for each distribution.
Figure 2.
Figure 2.. PepPrCLIP generation and in silico benchmarking.
(A) Analysis of Hamming distance of generated peptides to source peptides vs. variance scaling factor (k) of embedding perturbation. We computed the mean Hamming distance between generated peptides and their source peptides as we varied the value of k. We did this for an ensemble of 100 source 18-mer peptides, creating 500 generated peptides for each. The plot of Hamming distance as we varied k is depicted in Figure 2A. We observed that at k = 5, we see mutations on the order of 1 amino acid, and mean Hamming distance increases until k = 22, at which point almost the entire sequence is mutated. Thus, we chose that k would range between 5 and 22, sampling the full range of sequence transformations. (B) In silico hit-rate assessment of PepPrCLIP vs. RFDiffusion. Utilizing AlphaFold-Multimer, ipTM scores were computed for both the generated and test peptides in conjunction with the target protein sequence. The entries are organized in accordance with the ipTM scores attributed to the test set peptides. The hit-rate is characterized by the generated peptides exhibiting ipTM scores ≥ those of the ground truth test peptides. (C) Post co-folding with input structured target, comparison of ipTM scores of PepPrCLIP-generated peptides to RFDiffusion-generated peptides. All ipTM scores, targets, and peptide sequences are provided in the Supplementary Data file.
Figure 3.
Figure 3.. Characterization of PepPrCLIP/RFDiffusion-generated peptides for UltraID inhibition.
(A) AlphaFold2-predicted structure of UltraID. The magenta region refers to hotspot amino acids 35–51 and the yellow region refers to hotspot amino acids 102–124. (B) Schematic of IP-induced inhibition of UltraID catalytic activity. Briefly, each of the PepPrCLIP/RFDiffusion-generated peptides was fused to the C-terminus of UltraID via flexible linkers and the HA epitope in a pCAG vector. The resulting plasmids were separately transfected into HEK293T cells. At 48 hours post-transfection, the cells were treated with 50 μM biotin for 30 mins before fixation and immunostaining. The control plasmid is the vector expressing UltraID but not any IP. (C) Initial screening and quantification of UltraID inhibition efficiency of PepPrCLIP (blue) and RFDiffusion (orange)-generated IPs. The inhibition efficiency was evaluated by the ratio between biotinylation level (streptavidin fluorescence level) and transfection efficiency (HA fluorescence level), which was further normalized to the control samples. Each plasmid was transfected into two wells of HEK293T cells in 24-well plates, and two fluorescence images were captured for each well. Statistical significance was determined by an unpaired two-tailed Student’s t test. Calculated p values are represented as follows: *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001; ns, not significant. (D) Quantification of UltraID inhibition efficiency of 7 top candidate IPs in the independent biological replicates (n=3). The experiment and statistical analysis were performed similarly as described in 3B.
Figure 4.
Figure 4.. Characterization of PepPrCLIP-generated peptide-guided uAbs for β-catenin.
(A) AlphaFold2-predicted structure of β-catenin. (B) TOPFlash luciferase reporter assay of Wnt/β-catenin transcriptional activity. FOPFlash reporter served as a negative control. (C) Left: Degradation of endogenous β-catenin in cytosolic fractions of DLD1 cells analyzed via immunoblotting with anti-β-catenin and anti-GAPDH antibodies. Right: Densitometry analysis of immunoblots (n=3) using ImageJ software was performed to quantify β-catenin levels. Statistical significance was determined by an unpaired two-tailed Student’s t test. Calculated p values are represented as follows: **, p < 0.01; ***, p < 0.001; ns, not significant. Raw, uncropped blots are shown in Supplementary Figure 4. (D) β-catenin binding activity determined by ELISA with immobilized β-catenin. Binding to bovine serum albumin (BSA) served as a negative control. (E) Biolayer interferometry (BLI) analysis of β-cat-PpC3-CHIPΔTPR and β-cat-PpC4-CHIPΔTPR. Details on methodologies can be found in the Methods section.
Figure 5.
Figure 5.. Characterization of PepPrCLIP-generated peptide-guided uAbs for SS18-SSX1 fusion.
(A) AlphaFold2-predicted structure of SS18-SSX1. (B) Left: Schematic representation of the STY-SSX1 fusion protein fused to mCherry is shown. Middle: Fluorescence scale values of mCherry for SS18-SSX1 mCherry-fusion expressing cells within the peptide-expressing BFP positive fraction are plotted as a violin plot. The polyalanine (PolyA) control (gray) or each of the 10 SYT-SSX1 peptides (red) are shown (middle graph). Statistical significance was determined by an unpaired two-tailed Student’s t test. Calculated p values are represented as follows: **, p < 0.01; ***, p < 0.001; ****, p < 0.0001; ns, not significant. Right: Bar graph of median fluorescence intensity (MFI) for the PolyA control (gray) or the 10 peptides are shown. Samples were treated in independent biological replicates (n=2). Gating strategy is described in Supplementary Figure 3. (C) Immunoblot to probe endogenous SS18-SSX1 fusion in HS-SYII synovial sarcoma cells treated with SS-PpC_4 3 days post transient transfection. β-actin was used as loading control. Experiments represent independent biological replicates (n=2). Raw, uncropped blots are shown in Supplementary Figure 4.

References

    1. Behan F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019). - PubMed
    1. Dang C. V., Reddy E. P., Shokat K. M. & Soucek L. Drugging the ‘undruggable’ cancer targets. Nat. Rev. Cancer 17, 502–508 (2017). - PMC - PubMed
    1. Zhao L., Zhao J., Zhong K., Tong A. & Jia D. Targeted protein degradation: mechanisms, strategies and application. Signal Transduction and Targeted Therapy 7, 1–13 (2022). - PMC - PubMed
    1. Lim S. et al. bioPROTACs as versatile modulators of intracellular therapeutic targets including proliferating cell nuclear antigen (PCNA). Proc. Natl. Acad. Sci. U. S. A. 117, 5791–5800 (2020). - PMC - PubMed
    1. VanDyke D., Taylor J. D., Kaeo K. J., Hunt J. & Spangler J. B. Biologics-based degraders - an expanding toolkit for targeted-protein degradation. Curr. Opin. Biotechnol. 78, 102807 (2022). - PMC - PubMed

Publication types

LinkOut - more resources