. 2021 Jun 9;13(1):43.

doi: 10.1186/s13321-021-00522-2.

GNINA 1.0: molecular docking with deep learning

Andrew T McNutt¹, Paul Francoeur¹, Rishal Aggarwal², Tomohide Masuda¹, Rocco Meli³, Matthew Ragoza¹, Jocelyn Sunseri¹, David Ryan Koes⁴

Affiliations

¹ Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
² Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032, India.
³ Department of Biochemistry, University of Oxford, Oxford, United Kingdom.
⁴ Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. dkoes@pitt.edu.

PMID: 34108002
PMCID: PMC8191141
DOI: 10.1186/s13321-021-00522-2

GNINA 1.0: molecular docking with deep learning

Andrew T McNutt et al. J Cheminform. 2021.

. 2021 Jun 9;13(1):43.

doi: 10.1186/s13321-021-00522-2.

Authors

Andrew T McNutt¹, Paul Francoeur¹, Rishal Aggarwal², Tomohide Masuda¹, Rocco Meli³, Matthew Ragoza¹, Jocelyn Sunseri¹, David Ryan Koes⁴

Affiliations

¹ Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
² Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032, India.
³ Department of Biochemistry, University of Oxford, Oxford, United Kingdom.
⁴ Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. dkoes@pitt.edu.

PMID: 34108002
PMCID: PMC8191141
DOI: 10.1186/s13321-021-00522-2

Abstract

Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. GNINA, utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of GNINA under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina .

Keywords: Deep learning; Molecular docking; Structure-based drug design.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
The Gnina sampling and scoring algorithm shown with relevant commandline parameters and the scope of CNN scoring

**Fig. 3**
Docking using the single CNN models and the newly selected Default Ensemble for rescoring the output poses. The binding pocket is defined by the known binding ligand. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 4**
Docking using the ensemble of each type of CNN model, the full ensemble of CNN models, and the newly selected Default Ensemble for rescoring the output poses. The binding pocket is defined by the known binding ligand. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 5**
Evaluation of the average time to dock one protein-ligand system from the PDBbind core set v.2016. Top1 is the percentage of top ranked targets with a RMSD less than 2 Å

**Fig. 6**
Comparing the Default CNN Ensemble for use in only rescoring of the poses output by the Monte Carlo chains or the refinement of the poses followed by a rescoring of the poses. The “refine” option has nearly the same docking performance as the “rescore” option when cross-docking. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 7**
Evaluating the role of exhaustiveness in the performance of docking with the Default CNN Ensemble by analyzing TopN. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 8**
Evaluation of the Number of Monte Carlo Saved in the performance of docking with the Default CNN Ensemble by analyzing TopN. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 9**
Evaluating a much greater number of modes on the performance of docking with the Default CNN Ensemble by analyzing TopN. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 10**
Increasing the exhaustiveness when using the whole protein as the binding box. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 11**
Comparison between rigid and flexible docking with the default Gnina parameters: (a) ligand RMSD differences between rigid and flexible docking versus target-cognate side chain RMSDs, (b) average ligand RMSD difference for different 1 Å intervals of target-cognate side chains RMSD

**Fig. 12**
CNN model ensembles evaluated on the subset of proteins and ligands not present in their training datasets. Ensemble models used with the default arguments defined above. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å

**Fig. 13**
Thresholding the top pose by the score determined by the CNN. Top1 is the percentage of top ranked targets with a RMSD less than 2 Å

See this image and copyright information in PMC

References

1. Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 2004;3(11):935–949. doi: 10.1038/nrd1549. - DOI - PubMed
1. Leach AR, Shoichet BK, Peishoff CE. Prediction of protein–ligand interactions docking and scoring: successes and gaps. J Med Chem. 2006;49(20):5851–5855. doi: 10.1021/jm060999m. - DOI - PubMed
1. Lyu J, Wang S, Balius TE, Singh I, Levit A, Moroz YS, OMeara MJ, Che T, Algaa E, Tolmachova K et al (2019) Ultra-large library docking for discovering new chemotypes. Nature 566(7743):224–229 - PMC - PubMed
1. Muegge I, Martin YC. A general and fast scoring function for protein–ligand interactions: a simplified potential approach. J Med Chem. 1999;42(5):791–804. doi: 10.1021/jm980536j. - DOI - PubMed
1. Muegge I. A knowledge-based scoring function for protein-ligand interactions: probing the reference state. Perspect Drug Discov Design. 2000;20(1):99–114. doi: 10.1023/A:1008729005958. - DOI

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GNINA 1.0: molecular docking with deep learning

Affiliations

GNINA 1.0: molecular docking with deep learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials