. 2023 Nov 22;25(1):bbad451.

doi: 10.1093/bib/bbad451.

Multi-task bioassay pre-training for protein-ligand binding affinity prediction

Jiaxian Yan¹, Zhaofeng Ye², Ziyi Yang², Chengqiang Lu¹, Shengyu Zhang², Qi Liu¹, Jiezhong Qiu²

Affiliations

¹ Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China.
² Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China.

PMID: 38084920
PMCID: PMC10783875
DOI: 10.1093/bib/bbad451

Multi-task bioassay pre-training for protein-ligand binding affinity prediction

Jiaxian Yan et al. Brief Bioinform. 2023.

. 2023 Nov 22;25(1):bbad451.

doi: 10.1093/bib/bbad451.

Authors

Jiaxian Yan¹, Zhaofeng Ye², Ziyi Yang², Chengqiang Lu¹, Shengyu Zhang², Qi Liu¹, Jiezhong Qiu²

Affiliations

¹ Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China.
² Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China.

PMID: 38084920
PMCID: PMC10783875
DOI: 10.1093/bib/bbad451

Abstract

Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.

Keywords: bioassay; graph neural network; pre-training; protein–ligand binding affinity.

PubMed Disclaimer

Figures

**Figure 1**
A real example of bioassay data in ChEMBL. (1) The top three panels show an example where the same protein–ligand pair have different binding affinities with assay 1–3 in terms of measurement type (IC50 versus Ki) and value (IC50=10 nM versus IC50=7600 nM). (2) The bottom two panels show an example of the binding of different ligands (Ligand 1 and 2) to a protein in the same assay (assay 3).

**Figure 2**
The framework of MBP in pre-training and fine-tuning. The solid arrows indicate the flow path of the running examples of AssayID = CHEMBL1216983 during pre-training and PDB ID = 3g2y during fine-tuning.

**Figure 3**
Shared bottom encoder of MBP. It contains three modules: (A) encoding module, (B) interacting module and (C) read-out module. (D) The detailed GNN model of the ligand/protein encoder in the encoding module.

**Figure 4**
Comparison of ChEMBL-Dock with PDBbind and CrossDocked on label and structure.

**Figure 5**
Performance improvements of baselines and MBP on the PDBbind benchmark when training on general set.

**Figure 6**
Performance of MBP on 4366 data that are unavailable in the PDBbind v2016 training set.

**Figure 7**
Test RMSE and MAE of MBP on the PDBbind core set with varying weight coefficients of the regression loss.

formula image — **Figure 7**
Test RMSE and MAE of MBP on the PDBbind core set with varying weight coefficients of the regression loss.

**Figure 8**
Interaction weight visualization and analysis for data 3zt2. (A) Protein–ligand residue–atom interaction weight before training. (B) Protein–ligand residue–atom interaction weight after training. (C) Ligand atom weight before training. (D) Ligand atom weight after training. (E) Protein–ligand detailed interaction visualized by Protein–Ligand Interaction Profiler. The gray dashed lines indicate the hydrophobic interactions and the blue solid lines indicate the hydrogen bonds.

**Figure B1**
Construction process of ChEMBL-Dock.

See this image and copyright information in PMC

Cited by

EM-PLA: environment-aware heterogeneous graph-based multimodal protein-ligand binding affinity prediction.
Xie Z, Zhang P, Fan Z, Zhang Q, Lin Q. Xie Z, et al. Bioinformatics. 2025 Jul 1;41(7):btaf298. doi: 10.1093/bioinformatics/btaf298. Bioinformatics. 2025. PMID: 40354612 Free PMC article.
Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology.
Long Y, Donald BR. Long Y, et al. PLoS Comput Biol. 2025 Jun 27;21(6):e1013216. doi: 10.1371/journal.pcbi.1013216. eCollection 2025 Jun. PLoS Comput Biol. 2025. PMID: 40577377 Free PMC article.
Assay2Mol: Large Language Model-based Drug Design Using BioAssay Context.
Deng Y, Ericksen SS, Gitter A. Deng Y, et al. ArXiv [Preprint]. 2025 Jul 16:arXiv:2507.12574v1. ArXiv. 2025. PMID: 40709303 Free PMC article. Preprint.

References

1. Rizzuti B, Grande F. Chapter 14- virtual screening in drug discovery: a precious tool for a still-demanding challenge. In: Pey AL (ed). Protein Homeostasis Diseases. Academic Press, United States, 2020, 309–27.
1. Seo S, Choi J, Park S, Ahn J. Binding affinity prediction for protein-ligand complex using deep attention mechanism based on intermolecular interactions. BMC Bioinformatics 2021;22:542. - PMC - PubMed
1. Jacob L, Vert J-P. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 2008;24:2149–56. - PMC - PubMed
1. Deng Y, Roux B. Computations of standard binding free energies with molecular dynamics simulations. J Phys Chem B 2009;113(8):2234–46. - PMC - PubMed
1. Jumper JM, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with alphafold. Nature 2021;596:583–9. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-task bioassay pre-training for protein-ligand binding affinity prediction

Affiliations

Multi-task bioassay pre-training for protein-ligand binding affinity prediction

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous