Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 9;13(1):43.
doi: 10.1186/s13321-021-00522-2.

GNINA 1.0: molecular docking with deep learning

Affiliations

GNINA 1.0: molecular docking with deep learning

Andrew T McNutt et al. J Cheminform. .

Abstract

Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. GNINA, utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of GNINA under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina .

Keywords: Deep learning; Molecular docking; Structure-based drug design.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The Gnina sampling and scoring algorithm shown with relevant commandline parameters and the scope of CNN scoring
Fig. 2
Fig. 2
Example Gnina usage
Fig. 3
Fig. 3
Docking using the single CNN models and the newly selected Default Ensemble for rescoring the output poses. The binding pocket is defined by the known binding ligand. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 4
Fig. 4
Docking using the ensemble of each type of CNN model, the full ensemble of CNN models, and the newly selected Default Ensemble for rescoring the output poses. The binding pocket is defined by the known binding ligand. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 5
Fig. 5
Evaluation of the average time to dock one protein-ligand system from the PDBbind core set v.2016. Top1 is the percentage of top ranked targets with a RMSD less than 2 Å
Fig. 6
Fig. 6
Comparing the Default CNN Ensemble for use in only rescoring of the poses output by the Monte Carlo chains or the refinement of the poses followed by a rescoring of the poses. The “refine” option has nearly the same docking performance as the “rescore” option when cross-docking. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 7
Fig. 7
Evaluating the role of exhaustiveness in the performance of docking with the Default CNN Ensemble by analyzing TopN. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 8
Fig. 8
Evaluation of the Number of Monte Carlo Saved in the performance of docking with the Default CNN Ensemble by analyzing TopN. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 9
Fig. 9
Evaluating a much greater number of modes on the performance of docking with the Default CNN Ensemble by analyzing TopN. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 10
Fig. 10
Increasing the exhaustiveness when using the whole protein as the binding box. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 11
Fig. 11
Comparison between rigid and flexible docking with the default Gnina parameters: (a) ligand RMSD differences between rigid and flexible docking versus target-cognate side chain RMSDs, (b) average ligand RMSD difference for different 1 Å intervals of target-cognate side chains RMSD
Fig. 12
Fig. 12
CNN model ensembles evaluated on the subset of proteins and ligands not present in their training datasets. Ensemble models used with the default arguments defined above. TopN is the percentage of targets ranked above or at N with a RMSD less than 2 Å
Fig. 13
Fig. 13
Thresholding the top pose by the score determined by the CNN. Top1 is the percentage of top ranked targets with a RMSD less than 2 Å

References

    1. Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 2004;3(11):935–949. doi: 10.1038/nrd1549. - DOI - PubMed
    1. Leach AR, Shoichet BK, Peishoff CE. Prediction of protein–ligand interactions docking and scoring: successes and gaps. J Med Chem. 2006;49(20):5851–5855. doi: 10.1021/jm060999m. - DOI - PubMed
    1. Lyu J, Wang S, Balius TE, Singh I, Levit A, Moroz YS, OMeara MJ, Che T, Algaa E, Tolmachova K et al (2019) Ultra-large library docking for discovering new chemotypes. Nature 566(7743):224–229 - PMC - PubMed
    1. Muegge I, Martin YC. A general and fast scoring function for protein–ligand interactions: a simplified potential approach. J Med Chem. 1999;42(5):791–804. doi: 10.1021/jm980536j. - DOI - PubMed
    1. Muegge I. A knowledge-based scoring function for protein-ligand interactions: probing the reference state. Perspect Drug Discov Design. 2000;20(1):99–114. doi: 10.1023/A:1008729005958. - DOI