Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 24;11(1):34.
doi: 10.1186/s13321-019-0358-3.

Gypsum-DL: an open-source program for preparing small-molecule libraries for structure-based virtual screening

Affiliations

Gypsum-DL: an open-source program for preparing small-molecule libraries for structure-based virtual screening

Patrick J Ropp et al. J Cheminform. .

Abstract

Computational techniques such as structure-based virtual screening require carefully prepared 3D models of potential small-molecule ligands. Though powerful, existing commercial programs for virtual-library preparation have restrictive and/or expensive licenses. Freely available alternatives, though often effective, do not fully account for all possible ionization, tautomeric, and ring-conformational variants. We here present Gypsum-DL, a free, robust open-source program that addresses these challenges. As input, Gypsum-DL accepts virtual compound libraries in SMILES or flat SDF formats. For each molecule in the virtual library, it enumerates appropriate ionization, tautomeric, chiral, cis/trans isomeric, and ring-conformational forms. As output, Gypsum-DL produces an SDF file containing each molecular form, with 3D coordinates assigned. To demonstrate its utility, we processed 1558 molecules taken from the NCI Diversity Set VI and 56,608 molecules taken from a Distributed Drug Discovery (D3) combinatorial virtual library. We also used 4463 high-quality protein-ligand complexes from the PDBBind database to show that Gypsum-DL processing can improve virtual-screening pose prediction. Gypsum-DL is available free of charge under the terms of the Apache License, Version 2.0.

Keywords: 3D structure generation; Chemical biology; Computational biology; Computer-aided drug discovery; Small-molecule libraries; Virtual screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The Gypsum workflow. a Gypsum prepares a virtual small-molecule library by desalting the input compounds and considering alternate ionization, tautomeric, chiral, and cis/trans isomeric states. It then converts all variants to 3D, accounting for alternate ring conformations where appropriate. b Illustrative examples of each Gypsum step
Fig. 2
Fig. 2
A schematic of the Gypsum-DL algorithm for generating ring-conformational forms. a Create multiple 3D variants using ETKDG and UFF optimization. b Extract the rings. c Collect the coordinates of the ring atoms. d Construct ring fingerprints by calculating the RMSD between each ring and the corresponding ring of the first model. e, f Use k-means clustering to identify unique ring fingerprints. The small circles on the graphs represent fingerprints, the larger dashed circles represent clusters, and the black circles represent the most central fingerprint of each cluster. g The central fingerprints correspond to geometrically unique models
Fig. 3
Fig. 3
Simplified representations of the reaction schemes used to enumerate the D3 library. The spheres represent the bound polymer. a Alkylation of the polymer-bound benzophenone-imine glycine using the alkyl-halide building blocks. b Alkylation using the Michael-acceptor building blocks. c Deprotecting the benzophenone protecting group. d Acylation using the carboxylic-acid building blocks. e Cleaving the polymer
Fig. 4
Fig. 4
Gypsum-DL benchmarks. a Run times on a single compute node, using multiprocessing mode (1000 input SMILES strings). b Run times on multiple compute nodes, using mpi mode (20,000 input SMILES strings). All benchmarks were performed in triplicate. Errors bars represent standard deviations
Fig. 5
Fig. 5
An illustration of the crystallographic and docked poses of folic acid bound to the human folate receptor beta. The carbon atoms of the protein, the crystallographic ligand, the docked Gypsum-DL compound, and the docked Open-Babel compound are shown in green, yellow, gray, and pink, respectively. a The region of the pocket near R152. b The Gypsum-DL-prepared compound forms additional interactions with other protein residues (in green). Possible hydrogen bonds are shown as dotted lines. c The crystallographic pose, shown for reference. Image generated using BlendMol [33]
Fig. 6
Fig. 6
Example program output. Gypsum-DL outputs a deprotonated oseltamivir carboxylate, b both the ketone and enol forms of butan-2-one, c both the (R) and (S) enantiomers of bromochlorofluoroiodomethane, d both the E and Z isomers of 1-bromo-2-chloro-2-fluoro-1-iodoethene, e the twist-boat conformation of cis-1,4-di-tert-butylcyclohexane, and f only one rotomer of propan-1-ol. Image generated using BlendMol [33]

References

    1. Lionta E, Spyrou G, Vassilatis DK, Cournia Z. Structure-based virtual screening for drug discovery: principles, applications and recent advances. Curr Top Med Chem. 2014;14:1923–1938. doi: 10.2174/1568026614666140929124445. - DOI - PMC - PubMed
    1. Tanrikulu Y, Kruger B, Proschak E. The holistic integration of virtual screening in drug discovery. Drug Discov Today. 2013;18:358–364. doi: 10.1016/j.drudis.2013.01.007. - DOI - PubMed
    1. Hawkins PC, Nicholls A. Conformer generation with OMEGA: learning from the data set and the analysis of failures. J Chem Inf Model. 2012;52:2919–2936. doi: 10.1021/ci300314k. - DOI - PubMed
    1. Hawkins PC, Skillman AG, Warren GL, Ellingson BA, Stahl MT. Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. J Chem Inf Model. 2010;50:572–584. doi: 10.1021/ci100031x. - DOI - PMC - PubMed
    1. Miteva MA, Guyon F, Tuffery P. Frog2: efficient 3D conformation ensemble generator for small compounds. Nucleic Acids Res. 2010;38:W622–W627. doi: 10.1093/nar/gkq325. - DOI - PMC - PubMed