Deep learning in GPCR drug discovery: benchmarking the path to accurate peptide binding

Luuk R Hoegen Dijkhof^{1

2}, Teemu K E Rönkkö^{1

2}, Hans C von Vegesack^{1

2}, Jacob Lenzing^{1

2}, Alexander S Hauser^{1

2}

Affiliations

¹ Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark.
² Center for Pharmaceutical Data Science, University of Copenhagen, Denmark.

PMID: 40285358
PMCID: PMC12031724
DOI: 10.1093/bib/bbaf186

Deep learning in GPCR drug discovery: benchmarking the path to accurate peptide binding

Luuk R Hoegen Dijkhof et al. Brief Bioinform. 2025.

. 2025 Mar 4;26(2):bbaf186.

doi: 10.1093/bib/bbaf186.

Authors

Luuk R Hoegen Dijkhof^{1

2}, Teemu K E Rönkkö^{1

2}, Hans C von Vegesack^{1

2}, Jacob Lenzing^{1

2}, Alexander S Hauser^{1

2}

Affiliations

¹ Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark.
² Center for Pharmaceutical Data Science, University of Copenhagen, Denmark.

PMID: 40285358
PMCID: PMC12031724
DOI: 10.1093/bib/bbaf186

Abstract

Deep learning (DL) methods have drastically advanced structure-based drug discovery by directly predicting protein structures from sequences. Recently, these methods have become increasingly accurate in predicting complexes formed by multiple protein chains. We evaluated these advancements to predict and accurately model the largest receptor family and its cognate peptide hormones. We benchmarked DL tools, including AlphaFold 2.3 (AF2), AlphaFold 3 (AF3), Chai-1, NeuralPLexer, RoseTTAFold-AllAtom, Peptriever, ESMFold, and D-SCRIPT, to predict interactions between G protein-coupled receptors (GPCRs) and their endogenous peptide ligands. Our results showed that structure-aware models outperformed language models in peptide binding classification, with the top-performing model achieving an area under the curve of 0.86 on a benchmark set of 124 ligands and 1240 decoys. Rescoring predicted structures on local interactions further improved the principal ligand discovery among decoy peptides, whereas DL-based approaches did not. We explored a competitive tournament approach for modeling multiple peptides simultaneously on a single GPCR, which accelerates the performance but reduces true-positive recovery. When evaluating the binding poses of 67 recent complexes, AF2 reproduced the correct binding modes in nearly all cases (94%), surpassing those of both AF3 and Chai-1. Confidence scores correlate with structural binding mode accuracy, which provides a guide for interpreting interface predictions. These results demonstrated that DL models can reliably rediscover peptide binders, aid peptide drug discovery, and guide the selection of optimal tools for GPCR-targeted therapies. To this end, we provided a practical guide for selecting the best models for specific applications and an independent benchmarking set for future model evaluation.

Keywords: AlphaFold; GPCR; RoseTTAFold; docking; peptide receptor; structure prediction.

PubMed Disclaimer

Figures

**Figure 1**
Benchmarking classifier performance. (a) A benchmarking dataset was generated consisting of GPCR–peptide pairs, where decoys were selected based on binding pocket similarity between binding pocket generic residues for each decoy and GPCR endogenous peptide (orange) pair. The dataset includes 124 GPCR–principal ligand and 1240 GPCR–decoy pairs. The glucagon receptor is shown as an example, with 11 AF2 complex predictions superimposed. The ipTM + pTM score ranks the ligand (glucagon) first and separates it from the decoys. (b) We evaluated ligand recall when incrementally selecting more peptides from the ranked set of 11 peptides for each GPCR. AF2 (without templates) performs best, ranking the principal ligand first among 10 decoys for 58% of GPCRs. (c) Model performance as classifiers, evaluating 124 binders and 1240 decoys. Only the highest-ranked ligand among the 11 per receptor is considered positive. Supplementary Table 3 shows extended performance metrics for only similar and only dissimilar decoys.

**Figure 2**
Classifier rescoring. (a) The cognate ligand retention of the best base models and the effect of rescoring on performance. The retention at random class assignment is shown as a dashed line. AF2 without templates (^†), AF3 (without templates), and Chai-1 were included as the best-performing base models. (b) The ranking comparison of AF2 and AFM-LIS for 124 principal ligands. After rescoring AF2 complexes, AFM-LIS correctly reassigns 23 misranked ligands to rank 1 but misranks 6 previously correct ligands. Overall, the improvement in ranking performance is statistically significant (Wilcoxon signed-rank test, P = .0018). (c) The largest improvement in rank after AFM-LIS rescoring was for lgr4. The default AF2 scoring metric ipTM + pTM (before) and AFM-LIS (after) are shown for the peptides, with principal ligand R-spondin-4 (GtP ID: 3700) moving from rank 7 to rank 1. The predicted complex is displayed, and the shorter decoy that was initially ranked first is superimposed onto the structure.

**Figure 3**
Classifier sub-analysis. (a) Spearman’s rank correlations between complex attributes and ranking performance, with per-attribute Bonferroni-corrected P-values shown for significant correlations. Positive correlations indicate that ligands are ranked worse as the attribute increases, while negative correlations show that increasing the attribute improves rank performance. (b) A comparison of ligand rankings for different GPCR classifications, including class, presence of resolved complexes in training data (seen), and interface type based on receptor family. Significance is indicated by Wilcoxon signed-rank tests, with per-attribute Bonferroni-corrected P-values shown if P <.05.

**Figure 4**
Structural benchmark of GPCR–peptide interactions published after the training date cutoff for all included modeling tools. (a) DockQ score distributions of 67 predicted GPCR–peptide pairs for each model. AF2 (with and without structural templates) and AF3 outperform all other included models. (b) Performance of Chai-1, AF, and RF-AA on two sample structures from the structural benchmark set, according to the DockQ scores of AF2 predictions ranging from worst (0.02, PDB code, 7XOW) to best (0.95, PDB code, 7VFX). (c) AFM-LIS and DockQ scores displayed a significant Spearman’s rank correlation for AF2 predictions with and without templates (r(65) = .67, P < .0001, r(65) = .67, P < .0001), while this correlation was not significant for AF3 (server) predictions (r(65) = .20, P = .097).

**Figure 5**
Recommended workflow for GPCR–peptide binding prediction. (a) Based on the results from the classifier benchmark, AF2, AF3, Chai-1, and Peptriever perform best in rediscovering principal ligands among decoys. These four models are recommended for different use cases as the structure-aware models require more computational resources compared to LLM-based methods. AF3 and Chai-1 (without input MSAs) offer viable alternatives to AF2, given their faster runtimes while delivering medium to acceptable DockQ-quality interfaces. Even though LLM-based Peptriever shows low initial recall, it can be applied at much larger scale for initial enrichment of positive interactions. (b) We caution users about RF-AA’s tendency to misfold GPCRs into the binding pocket. NeuralPLexer and ESMFold fail to dock peptides correctly, and D-SCRIPT predictions remain at chance level for GPCR–peptide interactions. (c) Based on the presented data, AF2 is recommended for accurate GPCR–peptide interface predictions. We provide an overview for interpreting confidence metrics (ipTM + pTM and AFM-LIS) for expected docking quality (DockQ score).

See this image and copyright information in PMC

References

1. Kryshtafovych A, Schwede T, Topf M, et al. Critical assessment of methods of protein structure prediction (CASP)-round XV. Proteins 2023;91:1539–49. 10.1002/prot.26617. - DOI - PMC - PubMed
1. Elofsson A. Progress at protein structure prediction, as seen in CASP15. Curr Opin Struct Biol 2023;80:102594. 10.1016/j.sbi.2023.102594. - DOI - PubMed
1. Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. 10.1038/s41586-021-03819-2. - DOI - PMC - PubMed
1. Wodak SJ, Vajda S, Lensink MF, et al. Critical assessment of methods for predicting the 3D structure of proteins and protein complexes. Annu Rev Biophys 2023;52:183–206. 10.1146/annurev-biophys-102622-084607. - DOI - PMC - PubMed
1. Hauser AS, Attwood MM, Rask-Andersen M, et al. Trends in GPCR drug discovery: new agents, targets and indications. Nat Rev Drug Discov 2017;16:829–42. 10.1038/nrd.2017.178. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

CF20-0248/Carlsberg Foundation

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep learning in GPCR drug discovery: benchmarking the path to accurate peptide binding

Affiliations

Deep learning in GPCR drug discovery: benchmarking the path to accurate peptide binding

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources