Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 1;26(9):1169-75.
doi: 10.1093/bioinformatics/btq112. Epub 2010 Mar 17.

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking

Affiliations

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking

Pedro J Ballester et al. Bioinformatics. .

Abstract

Motivation: Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions.

Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score.

Contact: pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Figure 1
Figure 1
RF-Score reproduces its training data with very high accuracy (Pearson correlation coefficient R=0.952 and RMSE=0.74).
Figure 2
Figure 2
Estimation of feature importance based on internal validation data. Overall, it shows the importance of each type of protein-ligand contact across training complexes, which are by construction representative of the entire PDB.
Figure 3
Figure 3
RF-Score predicts the test data with high accuracy (Pearson correlation coefficient R=0.778 and RMSE=1.58).

References

    1. Amini A, et al. A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming. Proteins. 2007;69:823–831. - PubMed
    1. Baxter CA, et al. Flexible Docking Using Tabu Search and an Empirical Estimate of Binding Affinity. Proteins: Struct., Funct., Genet. 1998;33:367–382. - PubMed
    1. Berman HM, et al. The Protein Data Bank. ucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Böhm H-J. The Development of a Simple Empirical Scoring Function to Estimate the Binding Constant for a Protein-Ligand Complex of Known Three-Dimensional Structure. J. Comput.-Aided Mol. Des. 1994;8:243–256. - PubMed
    1. Böhm H-J. Prediction of Binding Constants of Protein Ligands: A Fast Method for the Prioritization of Hits Obtained from De Novo Design or 3D Database Search Programs. J. Comput.-Aided Mol. Des. 1998;12:309–323. - PubMed

Publication types