Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 22;25(1):bbad451.
doi: 10.1093/bib/bbad451.

Multi-task bioassay pre-training for protein-ligand binding affinity prediction

Affiliations

Multi-task bioassay pre-training for protein-ligand binding affinity prediction

Jiaxian Yan et al. Brief Bioinform. .

Abstract

Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.

Keywords: bioassay; graph neural network; pre-training; protein–ligand binding affinity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A real example of bioassay data in ChEMBL. (1) The top three panels show an example where the same protein–ligand pair have different binding affinities with assay 1–3 in terms of measurement type (IC50 versus Ki) and value (IC50=10 nM versus IC50=7600 nM). (2) The bottom two panels show an example of the binding of different ligands (Ligand 1 and 2) to a protein in the same assay (assay 3).
Figure 2
Figure 2
The framework of MBP in pre-training and fine-tuning. The solid arrows indicate the flow path of the running examples of AssayID = CHEMBL1216983 during pre-training and PDB ID = 3g2y during fine-tuning.
Figure 3
Figure 3
Shared bottom encoder of MBP. It contains three modules: (A) encoding module, (B) interacting module and (C) read-out module. (D) The detailed GNN model of the ligand/protein encoder in the encoding module.
Figure 4
Figure 4
Comparison of ChEMBL-Dock with PDBbind and CrossDocked on label and structure.
Figure 5
Figure 5
Performance improvements of baselines and MBP on the PDBbind benchmark when training on general set.
Figure 6
Figure 6
Performance of MBP on 4366 data that are unavailable in the PDBbind v2016 training set.
Figure 7
Figure 7
Test RMSE and MAE of MBP on the PDBbind core set with varying weight coefficients formula image of the regression loss.
Figure 8
Figure 8
Interaction weight visualization and analysis for data 3zt2. (A) Protein–ligand residue–atom interaction weight before training. (B) Protein–ligand residue–atom interaction weight after training. (C) Ligand atom weight before training. (D) Ligand atom weight after training. (E) Protein–ligand detailed interaction visualized by Protein–Ligand Interaction Profiler. The gray dashed lines indicate the hydrophobic interactions and the blue solid lines indicate the hydrogen bonds.
Figure B1
Figure B1
Construction process of ChEMBL-Dock.

Similar articles

Cited by

References

    1. Rizzuti B, Grande F. Chapter 14- virtual screening in drug discovery: a precious tool for a still-demanding challenge. In: Pey AL (ed). Protein Homeostasis Diseases. Academic Press, United States, 2020, 309–27.
    1. Seo S, Choi J, Park S, Ahn J. Binding affinity prediction for protein-ligand complex using deep attention mechanism based on intermolecular interactions. BMC Bioinformatics 2021;22:542. - PMC - PubMed
    1. Jacob L, Vert J-P. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 2008;24:2149–56. - PMC - PubMed
    1. Deng Y, Roux B. Computations of standard binding free energies with molecular dynamics simulations. J Phys Chem B 2009;113(8):2234–46. - PMC - PubMed
    1. Jumper JM, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with alphafold. Nature 2021;596:583–9. - PMC - PubMed

Publication types