Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 27:3:22.
doi: 10.1186/1758-2946-3-22.

Predicting a small molecule-kinase interaction map: A machine learning approach

Affiliations

Predicting a small molecule-kinase interaction map: A machine learning approach

Fabian Buchwald et al. J Cheminform. .

Abstract

Background: We present a machine learning approach to the problem of protein ligand interaction prediction. We focus on a set of binding data obtained from 113 different protein kinases and 20 inhibitors. It was attained through ATP site-dependent binding competition assays and constitutes the first available dataset of this kind. We extract information about the investigated molecules from various data sources to obtain an informative set of features.

Results: A Support Vector Machine (SVM) as well as a decision tree algorithm (C5/See5) is used to learn models based on the available features which in turn can be used for the classification of new kinase-inhibitor pair test instances. We evaluate our approach using different feature sets and parameter settings for the employed classifiers. Moreover, the paper introduces a new way of evaluating predictions in such a setting, where different amounts of information about the binding partners can be assumed to be available for training. Results on an external test set are also provided.

Conclusions: In most of the cases, the presented approach clearly outperforms the baseline methods used for comparison. Experimental results indicate that the applied machine learning methods are able to detect a signal in the data and predict binding affinity to some extent. For SVMs, the binding prediction can be improved significantly by using features that describe the active site of a kinase. For C5, besides diversity in the feature set, alignment scores of conserved regions turned out to be very useful.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Training set inhibitors. Structures of the 20 inhibitors that were subject of our study [7].
Figure 2
Figure 2
Hard and soft case of LOOCV. Illustration of the hard (left) and the soft (right) case of LOOCV.
Figure 3
Figure 3
Mixed and mixed-mixed case of LOOCV. Illustration of the mixed (left) and the mixed-mixed (right) case of LOOCV.
Figure 4
Figure 4
Performance on different feature sets (soft case). Prediction accuracies, recall and precision for different feature sets from C5 and Support Vector Machines with different parameter settings (soft case).
Figure 5
Figure 5
Comparison of prediction accuracies with random features. Comparison of prediction accuracy for different feature sets including random features.
Figure 6
Figure 6
Performance comparison of the hard and the soft case. Comparison of the prediction accuracy and recall/precision in the hard and the soft case.
Figure 7
Figure 7
Performance using solely test kinase-inhibitor pairs. Comparison of prediction accuracy and recall/precision using solely test kinase-inhibitor pairs in the training set.
Figure 8
Figure 8
Performance comparison of different mixed cases (C5). Comparison of prediction accuracy and recall/precision for different mixed cases (C5 without global pruning).
Figure 9
Figure 9
Performance comparison of different mixed-mixed cases (C5). Comparison of prediction accuracy for the soft, hard, mixed and mixed-mixed cases (C5 without global pruning).
Figure 10
Figure 10
Performance on the external test set. Prediction accuracy and recall/precision on the external test set with feature set 7, for both C5 and SVMs.

Similar articles

Cited by

References

    1. Engvall E, Perlman P. Enzyme-linked immunosorbent assay (ELISA). Quantitative assay of immunoglobulin G. Immunochemistry. 1971;8(9):871–4. doi: 10.1016/0019-2791(71)90454-X. - DOI - PubMed
    1. LaValle SM, Finn PW, Kavraki LE, Latombe JC. Efficient database screening for rational drug design using pharmacophore-constrained conformational search. Proceedings of the third annual international conference on computational molecular biology, RECOMB'99, April 11-14, Lyon, France. 1999. pp. 250–260.
    1. Buzko OV, Bishop AC, Shokat KM. Modified AutoDock for accurate docking of protein kinase inhibitors. J Comput-Aided Mol Des. 2002;16(2):113–127. doi: 10.1023/A:1016366013656. - DOI - PubMed
    1. Yap CW, Chen YZ. Prediction of cytochrome P450 3A4, 2D6, and 2C9 inhibitors and substrates by using support vector machines. J Chem Inf Model. 2005;45(4):982–992. doi: 10.1021/ci0500536. - DOI - PubMed
    1. Helma C, Cramer T, Kramer S, De Raedt L. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Model. 2004;44(4):1402–1411. doi: 10.1021/ci034254q. - DOI - PubMed

LinkOut - more resources