De novo design of peptide binders to conformationally diverse targets with contrastive language modeling

Suhaas Bhat¹, Kalyan Palepu¹, Lauren Hong¹, Joey Mao², Tianzheng Ye³, Rema Iyer⁴, Lin Zhao¹, Tianlai Chen¹, Sophia Vincoff¹, Rio Watson¹, Tian Z Wang¹, Divya Srijay¹, Venkata Srikar Kavirayuni¹, Kseniia Kholina¹, Shrey Goel¹, Pranay Vure¹, Aniruddha J Deshpande⁴, Scott H Soderling², Matthew P DeLisa^{3

5

6}, Pranam Chatterjee^{1

7

8}

Affiliations

¹ Department of Biomedical Engineering, Duke University, Durham, NC, USA.
² Department of Cell Biology, Duke University, Durham, NC, USA.
³ Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA.
⁴ Cancer Genome and Epigenetics Program, Sanford Burnham Prebys Institute, San Diego, CA, USA.
⁵ Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA.
⁶ Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA.
⁷ Department of Computer Science, Duke University, Durham, NC, USA.
⁸ Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.

PMID: 39841846
PMCID: PMC11753435
DOI: 10.1126/sciadv.adr8638

De novo design of peptide binders to conformationally diverse targets with contrastive language modeling

Suhaas Bhat et al. Sci Adv. 2025.

. 2025 Jan 24;11(4):eadr8638.

doi: 10.1126/sciadv.adr8638. Epub 2025 Jan 22.

Authors

Affiliations

¹ Department of Biomedical Engineering, Duke University, Durham, NC, USA.
² Department of Cell Biology, Duke University, Durham, NC, USA.
³ Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA.
⁴ Cancer Genome and Epigenetics Program, Sanford Burnham Prebys Institute, San Diego, CA, USA.
⁵ Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA.
⁶ Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA.
⁷ Department of Computer Science, Duke University, Durham, NC, USA.
⁸ Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.

PMID: 39841846
PMCID: PMC11753435
DOI: 10.1126/sciadv.adr8638

Abstract

Designing binders to target undruggable proteins presents a formidable challenge in drug discovery. In this work, we provide an algorithmic framework to design short, target-binding linear peptides, requiring only the amino acid sequence of the target protein. To do this, we propose a process to generate naturalistic peptide candidates through Gaussian perturbation of the peptidic latent space of the ESM-2 protein language model and subsequently screen these novel sequences for target-selective interaction activity via a contrastive language-image pretraining (CLIP)-based contrastive learning architecture. By integrating these generative and discriminative steps, we create a Peptide Prioritization via CLIP (PepPrCLIP) pipeline and validate highly ranked, target-specific peptides experimentally, both as inhibitory peptides and as fusions to E3 ubiquitin ligase domains. PepPrCLIP-derived constructs demonstrate functionally potent binding and degradation of conformationally diverse, disease-driving targets in vitro. In total, PepPrCLIP empowers the modulation of previously inaccessible proteins without reliance on stable and ordered tertiary structures.

PubMed Disclaimer

Figures

**Fig. 1.. PepPrCLIP model training and evaluation.**
(A) Gaussian distributions centered around the ESM-2 embeddings of naturally occurring peptides are sampled and then decoded back to amino acid sequences. A trained CLIP module, jointly encoding cognate peptide-protein embeddings, screens thousands of these peptides for specific binding activity to the target. (B) Evaluation metrics for the final CLIP module. Binary accuracy is the accuracy of the model in predicting the correct binding pairs when given two protein-peptide pairs. Top 1 accuracy is the likelihood that for a given protein, the model has chosen the correct binding peptide, and top 10% accuracy is the likelihood that this peptide is in the top 10% of peptides when ranked for the CLIP score to this protein. Metric values are presented for the “strict” held-out test set described in Results and Materials and Methods. (C) Known peptide-target sequence pairs, from the PDB-validated dataset, were scored via the trained CLIP module. Mismatched pairs, each containing a peptide and a nonmatching protein, both from the validation dataset, were also scored via the CLIP module. +1 indicates a definitive binding pair, while −1 indicates that the peptide and target sequences do not bind (see Materials and Methods). Scores are represented as a histogram. The mean (μ) and variance (σ²) are provided for each distribution.

**Fig. 2.. PepPrCLIP generation and in silico benchmarking.**
(A) Analysis of Hamming distance of generated peptides to source peptides versus variance scaling factor (k) of embedding perturbation. We computed the mean Hamming distance between generated peptides and their source peptides as we varied the value of k. We did this for an ensemble of 100 source 18-nucleotide oligomer peptides, creating 500 generated peptides for each. The plot of Hamming distance as we varied k is depicted in (A). We observed that at k = 5, we see mutations on the order of one amino acid, and mean Hamming distance increases until k = 22, at which point almost the entire sequence is mutated. Thus, we chose that k would range between 5 and 22, sampling the full range of sequence transformations. (B) In silico hit-rate assessment of PepPrCLIP versus RFDiffusion. Using AlphaFold-Multimer, ipTM scores were computed for both the generated and test peptides in conjunction with the target protein sequence. The entries are organized in accordance with the ipTM scores attributed to the test set peptides. The hit rate is characterized by the generated peptides exhibiting ipTM scores greater than or equal to those of the ground truth test peptides. (C) Comparison of ipTM scores of PepPrCLIP-generated peptides to RFDiffusion-generated peptides after cofolding with the input structured target. All ipTM scores, targets, and peptide sequences are provided in data file S1.

**Fig. 3.. Characterization of PepPrCLIP/RFDiffusion–generated peptides for UltraID inhibition.**
(A) AlphaFold2-predicted structure of UltraID. The magenta region refers to hotspot amino acids 35 to 51, and the yellow region refers to hotspot amino acids 102 to 124. (B) Schematic of IP-induced inhibition of UltraID catalytic activity. Briefly, each of the PepPrCLIP/RFDiffusion–generated peptides was fused to the C terminus of UltraID via flexible linkers and the HA epitope in a pCAG vector. The resulting plasmids were separately transfected into HEK293T cells. At 48 hours after transfection, the cells were treated with 50 μM biotin for 30 min before fixation and fluorescence staining. The control plasmid is the vector expressing UltraID but not any IP. (C) Left: Initial screening and quantification of UltraID activity upon inhibition by PepPrCLIP (blue)– and RFDiffusion (orange)–generated IPs. The UltraID activity was evaluated by the ratio between the biotinylation level (streptavidin fluorescence level) and the transfection efficiency (HA fluorescence level), which was further normalized to the control samples. Each plasmid was transfected into two wells of HEK293T cells in 24-well plates, and two fluorescence images were captured for each well. Top right: Comparison between the averages of UltraID activity after treatment with PepPrCLIP- and RFDiffusion-generated IPs. Statistical significance was determined by an unpaired two-tailed Student’s t test with the Welch correction. Calculated P values are represented as follows: *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001; ns, not significant. (D) Left: Quantification of UltraID activity upon inhibition by the seven top candidate IPs in the independent biological replicates (n = 4). Top right: Comparison between the averages of UltraID activity after treatment with top PepPrCLIP- and RFDiffusion-generated candidate IPs. The experiment and statistical analysis were performed similarly as described in (C).

**Fig. 4.. Characterization of PepPrCLIP-generated peptide-guided uAbs for β-catenin.**
(A) AlphaFold2-predicted structure of β-catenin. (B) TOPFlash luciferase reporter assay of Wnt/β-catenin transcriptional activity. The FOPFlash reporter served as a negative control. (C) Left: Degradation of endogenous β-catenin in cytosolic fractions of DLD1 cells analyzed via immunoblotting with anti-β-catenin and anti-GAPDH antibodies. Right: Densitometry analysis of immunoblots (n = 3) using ImageJ software was performed to quantify β-catenin levels. Statistical significance was determined by an unpaired two-tailed Student’s t test. Calculated P values are represented as follows: **P < 0.01; ***P < 0.001; ns, not significant. Raw, uncropped blots are shown in fig. S4. (D) β-Catenin binding activity determined by ELISA with immobilized β-catenin. Binding to BSA served as a negative control. (E) BLI analysis of β-cat-PpC3-CHIPΔTPR and β-cat-PpC4-CHIPΔTPR. Details on methodologies can be found in Materials and Methods.

**Fig. 5.. Characterization of PepPrCLIP-generated peptide-guided uAbs for SS18-SSX1 fusion.**
(A) AlphaFold2-predicted structure of SS18-SSX1. (B) Left: Schematic representation of the SS18-SSX1 fusion protein fused to mCherry is shown. Middle: Fluorescence scale values of mCherry for SS18-SSX1 mCherry-fusion expressing cells within the peptide-expressing BFP positive fraction are plotted as a violin plot. The polyA control (gray) or each of the 10 SS18-SSX1-targeting peptides (red) is shown (middle graph). Statistical significance was determined by an unpaired two-tailed Student’s t test with Bonferroni correction. Calculated P values are represented as follows: *P < 0.005; ***P < 0.0001; ns, not significant. Right: The bar graph of median fluorescence intensity for the polyA control (gray) or the 10 peptides is shown. Samples were treated in independent biological replicates (n = 2). The gating strategy is described in fig. S3. (C) Immunoblot to probe endogenous SS18-SSX1 fusion in HS-SYII synovial sarcoma cells treated with SS-PpC_4 3 days after transient transfection. β-Actin was used as the loading control. Right: Densitometry analysis of immunoblots (n = 2) using ImageJ software was performed to quantify SS18-SSX1 levels. Raw, uncropped blots are shown in fig. S4.

See this image and copyright information in PMC

Update of

De Novo Design of Peptide Binders to Conformationally Diverse Targets with Contrastive Language Modeling.
Bhat S, Palepu K, Hong L, Mao J, Ye T, Iyer R, Zhao L, Chen T, Vincoff S, Watson R, Wang T, Srijay D, Kavirayuni VS, Kholina K, Goel S, Vure P, Desphande AJ, Soderling SH, DeLisa MP, Chatterjee P. Bhat S, et al. bioRxiv [Preprint]. 2024 Jul 22:2023.06.26.546591. doi: 10.1101/2023.06.26.546591. bioRxiv. 2024. Update in: Sci Adv. 2025 Jan 24;11(4):eadr8638. doi: 10.1126/sciadv.adr8638. PMID: 39091799 Free PMC article. Updated. Preprint.

References

1. Behan F. M., Iorio F., Picco G., Gonçalves E., Beaver C. M., Migliardi G., Santos R., Rao Y., Sassi F., Pinnelli M., Ansari R., Harper S., Jackson D. A., McRae R., Pooley R., Wilkinson P., van der Meer D., Dow D., Buser-Doepner C., Bertotti A., Trusolino L., Stronach E. A., Saez-Rodriguez J., Yusa K., Garnett M. J., Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019). - PubMed
1. Dang C. V., Reddy E. P., Shokat K. M., Soucek L., Drugging the “undruggable” cancer targets. Nat. Rev. Cancer 17, 502–508 (2017). - PMC - PubMed
1. Zhao L., Zhao J., Zhong K., Tong A., Jia D., Targeted protein degradation: Mechanisms, strategies and application. Signal Transduct. Target. Ther. 7, 113 (2022). - PMC - PubMed
1. Lim S., Khoo R., Peh K. M., Teo J., Chang S. C., Ng S., Beilhartz G. L., Melnyk R. A., Johannes C. W., Brown C. J., Lane D. P., Henry B., Partridge A. W., bioPROTACs as versatile modulators of intracellular therapeutic targets including proliferating cell nuclear antigen (PCNA). Proc. Natl. Acad. Sci. U.S.A. 117, 5791–5800 (2020). - PMC - PubMed
1. VanDyke D., Taylor J. D., Kaeo K. J., Hunt J., Spangler J. B., Biologics-based degraders — An expanding toolkit for targeted-protein degradation. Curr. Opin. Biotechnol. 78, 102807 (2022). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

De novo design of peptide binders to conformationally diverse targets with contrastive language modeling

Affiliations

De novo design of peptide binders to conformationally diverse targets with contrastive language modeling

Authors

Affiliations

Abstract

Figures

Update of

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources