. 2018 Mar 1;34(5):770-778.

doi: 10.1093/bioinformatics/btx638.

Machine learning accelerates MD-based binding pose prediction between ligands and proteins

Kei Terayama¹, Hiroaki Iwata², Mitsugu Araki³, Yasushi Okuno^{3

4}, Koji Tsuda^{1

5

6}

Affiliations

¹ Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8561, Japan.
² Foundation for Biomedical Research and Innovation, Hyogo 650-0047, Japan.
³ RIKEN Advanced Institute for Computational Science, Hyogo 650-0047, Japan.
⁴ Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan.
⁵ Center for Materials Research by Information Integration, NIMS, Ibaraki 305-0047, Japan.
⁶ RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan.

PMID: 29040432
PMCID: PMC6030886
DOI: 10.1093/bioinformatics/btx638

Machine learning accelerates MD-based binding pose prediction between ligands and proteins

Kei Terayama et al. Bioinformatics. 2018.

. 2018 Mar 1;34(5):770-778.

doi: 10.1093/bioinformatics/btx638.

Authors

Kei Terayama¹, Hiroaki Iwata², Mitsugu Araki³, Yasushi Okuno^{3

4}, Koji Tsuda^{1

5

6}

Affiliations

¹ Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8561, Japan.
² Foundation for Biomedical Research and Innovation, Hyogo 650-0047, Japan.
³ RIKEN Advanced Institute for Computational Science, Hyogo 650-0047, Japan.
⁴ Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan.
⁵ Center for Materials Research by Information Integration, NIMS, Ibaraki 305-0047, Japan.
⁶ RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan.

PMID: 29040432
PMCID: PMC6030886
DOI: 10.1093/bioinformatics/btx638

Abstract

Motivation: Fast and accurate prediction of protein-ligand binding structures is indispensable for structure-based drug design and accurate estimation of binding free energy of drug candidate molecules in drug discovery. Recently, accurate pose prediction methods based on short Molecular Dynamics (MD) simulations, such as MM-PBSA and MM-GBSA, among generated docking poses have been used. Since molecular structures obtained from MD simulation depend on the initial condition, taking the average over different initial conditions leads to better accuracy. Prediction accuracy of protein-ligand binding poses can be improved with multiple runs at different initial velocity.

Results: This paper shows that a machine learning method, called Best Arm Identification, can optimally control the number of MD runs for each binding pose. It allows us to identify a correct binding pose with a minimum number of total runs. Our experiment using three proteins and eight inhibitors showed that the computational cost can be reduced substantially without sacrificing accuracy. This method can be applied for controlling all kinds of molecular simulations to obtain best results under restricted computational resources.

Availability and implementation: Code and data are available on GitHub at https://github.com/tsudalab/bpbi.

Contact: terayama@cbms.k.u-tokyo.ac.jp or tsuda@k.u-tokyo.ac.jp.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Pose prediction using uniform sampling (A) and BAI (B) algorithms. The purpose of pose prediction is to select the best (minimum $\bar{Δ G_{20 vel .}}$ ) pose among N prepared docking poses. Using uniform sampling, the same number (k) of MD and MM-PBSA runs with different initial velocities is performed, resulting in a total of k × N runs. On the other hand, the total number of runs can be reduced by optimally controlling runs using a BAI algorithm (B) (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 2.**
The basic idea for our framework. (A) A flowchart of the BAI algorithm in the general setting. The purpose is to find the best arm (slot) by repeating selection and reward acquisition within a limited budget. (B) The BAI algorithm applied to the binding pose prediction problem. We can reduce the total number of MD and MM-PBSA runs to find the binding pose by efficient BAI algorithms (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 4.**
The distributions of calculated $Δ G_{bind} s$ for all poses in eight compounds. Twenty $Δ G_{bind}$ values are calculated for each pose. Red poses are the correct binding poses (RMSD < 2.0 Å) and blue ones are incorrect. Horizontal lines represent within boxes the mean values and indicate the mean value and the first and last quartile, while the ends of the whiskers show maximum and minimum values within 1.5 IQR (inter-quartile range: the distance between the first and last quartiles) of the first and last quartiles (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 5.**
Reductions of MD and MM-PBSA runs per pose by BAI algorithms in a pose prediction trial. Green bars show the numbers of runs (k = 10) per pose by uniform sampling in a pose prediction trial. Blue, purple, red and orange bars show the number of runs per pose using UGapE auto, UCB-E auto, SR and UCB(p) (p = 4). Black lines are the averaged binding free energies ( $\bar{Δ G_{20 vel .}}$ ). The total numbers of runs by BAI algorithms are reduced from 200 (10 × 20 poses) by uniform sampling to 50 and 75 without reducing the number of runs for promising poses, which have small $\bar{Δ G_{20 vel .}}$ values (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 6.**
The probabilities of correct pose prediction by the proposed methods and uniform sampling (baseline) at different numbers of MD and MM-PBSA runs for eight complexes. The total numbers of MD and MM-PBSA runs (computational cost) were reduced using the BAI algorithms [UGapE auto, UCB-E auto, UCB(p) p = 4 and SR] without sacrificing accuracy compared to uniform sampling (green). The probabilities of UGapE auto (blue) and UCB-E auto (purple), whose exploration parameters were automatically adjusted, are higher than those of uniform sampling (green). Although UCB(p) showed almost the same high performance as UGapE auto and UCB-E auto under the exploration parameter p = 4, the result varies depending on the parameter as shown in Supplementary Figure S3. The result using SR (red) is a little worse than other BAI algorithms (Color version of this figure is available at *Bioinformatics* online.)

See this image and copyright information in PMC

Cited by

In-Silico Approaches for the Screening and Discovery of Broad-Spectrum Marine Natural Product Antiviral Agents Against Coronaviruses.
Boswell Z, Verga JU, Mackle J, Guerrero-Vazquez K, Thomas OP, Cray J, Wolf BJ, Choo YM, Croot P, Hamann MT, Hardiman G. Boswell Z, et al. Infect Drug Resist. 2023 Apr 19;16:2321-2338. doi: 10.2147/IDR.S395203. eCollection 2023. Infect Drug Resist. 2023. PMID: 37155475 Free PMC article. Review.
Fragment-centric topographic mapping method guides the understanding of ABCG2-inhibitor interactions.
Wu Y, Gao XY, Chen XH, Zhang SL, Wang WJ, Sheng XH, Chen DZ. Wu Y, et al. RSC Adv. 2019 Mar 8;9(14):7757-7766. doi: 10.1039/c8ra09789e. eCollection 2019 Mar 6. RSC Adv. 2019. PMID: 35521159 Free PMC article.
DROIDS 3.0-Detecting Genetic and Drug Class Variant Impact on Conserved Protein Binding Dynamics.
Babbitt GA, Fokoue EP, Evans JR, Diller KI, Adams LE. Babbitt GA, et al. Biophys J. 2020 Feb 4;118(3):541-551. doi: 10.1016/j.bpj.2019.12.008. Epub 2019 Dec 18. Biophys J. 2020. PMID: 31928763 Free PMC article.
From Byte to Bench to Bedside: Molecular Dynamics Simulations and Drug Discovery.
Ahmed M, Maldonado AM, Durrant JD. Ahmed M, et al. ArXiv [Preprint]. 2023 Nov 28:arXiv:2311.16946v1. ArXiv. 2023. Update in: BMC Biol. 2023 Dec 29;21(1):299. doi: 10.1186/s12915-023-01791-z. PMID: 38076508 Free PMC article. Updated. Preprint.
In silico Prediction, Characterization, Molecular Docking, and Dynamic Studies on Fungal SDRs as Novel Targets for Searching Potential Fungicides Against Fusarium Wilt in Tomato.
Aamir M, Singh VK, Dubey MK, Meena M, Kashyap SP, Katari SK, Upadhyay RS, Umamaheswari A, Singh S. Aamir M, et al. Front Pharmacol. 2018 Oct 22;9:1038. doi: 10.3389/fphar.2018.01038. eCollection 2018. Front Pharmacol. 2018. PMID: 30405403 Free PMC article.

See all "Cited by" articles

References

1. Agrawal S., Goyal N. (2012) Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory, pp. 39.1–39.26.
1. Åqvist J. et al. (1994) A new method for predicting binding affinity in computer-aided drug design. Protein Eng., 7, 385–391. - PubMed
1. Audibert J.-Y., Bubeck S. (2010) Best arm identification in multi-armed bandits. In: Conference on Learning Theory, p. 13. Haifa, Israel.
1. Auer P. et al. (2002) Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47, 235–256.
1. Berhanu W.M., Hansmann U.H. (2013) The stability of cylindrin β-barrel amyloid oligomer models—a molecular dynamics study. Proteins Struct. Funct. Bioinf., 81, 1542–1555. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning accelerates MD-based binding pose prediction between ligands and proteins

Affiliations

Machine learning accelerates MD-based binding pose prediction between ligands and proteins

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources