Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 27;12(1):64.
doi: 10.1186/s13321-020-00466-z.

Chemoinformatics-based enumeration of chemical libraries: a tutorial

Affiliations

Chemoinformatics-based enumeration of chemical libraries: a tutorial

Fernanda I Saldívar-González et al. J Cheminform. .

Abstract

Virtual compound libraries are increasingly being used in computer-assisted drug discovery applications and have led to numerous successful cases. This paper aims to examine the fundamental concepts of library design and describe how to enumerate virtual libraries using open source tools. To exemplify the enumeration of chemical libraries, we emphasize the use of pre-validated or reported reactions and accessible chemical reagents. This tutorial shows a step-by-step procedure for anyone interested in designing and building chemical libraries with or without chemoinformatics experience. The aim is to explore various methodologies proposed by synthetic organic chemists and explore affordable chemical space using open-access chemoinformatics tools. As part of the tutorial, we discuss three examples of design: a Diversity-Oriented-Synthesis library based on lactams, a bis-heterocyclic combinatorial library, and a set of target-oriented molecules: isoindolinone based compounds as potential acetylcholinesterase inhibitors. This manuscript also seeks to contribute to the critical task of teaching and learning chemoinformatics.

Keywords: Chemical enumeration; Chemoinformatics; Combinatorial libraries; DOS synthesis; Drug design; Education; KNIME; Python.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no competing interest.

Figures

Fig. 1
Fig. 1
SMILES, SMARTS, InChI and InChIKey concepts. Examples for the illustration of basic SMILES, SMARTS, InChI, and InChIKey syntax rules are provided. SMARTS representations were made in SMARTviewer [35]. InChI and InChIKey identifiers are displayed for caffeine and 1-[(E)-2-fluorovinyl]-3-nitrobenzene
Fig. 2
Fig. 2
Searching the reductive amination involved in the synthesis of fentanyl in WebReactions. a Reaction input and fine-tuning. bd Example results
Fig. 3
Fig. 3
A strategy used to build bis-heterocycles
Fig. 4
Fig. 4
Workflow for the design of lactams. a Read structures of building blocks; b Building blocks filter: the structures were curated, filtered according to the ‘rule of three’, and selected for the presence of appropriate functional groups; c Coupling phase: application of the amide bond formation reaction between carboxylic acids and primary or secondary amines; d Pairing phase: use of the reactions as described in Table 8. Finally, the compounds were separated into macrocycles and not macrocycles
Fig. 5
Fig. 5
a Reaction input tab in Enumeration of Combinatorial Library; b Reactants input tab in Enumeration of Combinatorial Library; c View of the library generated
Fig. 6
Fig. 6
Post-processing plots. a PCA plot generated using six structural and physicochemical descriptors (MW, HBA, HBD, SlogP, TPSA and RBs). b PMI plot. Compounds are placed in a triangle where the vertices represent rod, disc, and spherical compounds. c Consensus Diversity Plot (CDP): (1) Approved drugs, (2) DOS, (3) Bis-heterocycles, (4) Isoindolinones. Scaffold diversity is measured in the vertical axis using area under the curve (AUC) and the diversity using molecular fingerprints is measured in the horizontal axis using MACCS/Tanimoto. Diversity based on physicochemical properties is represented by the Euclidean distance of the six physicochemical properties using a continuous color scale. The relative size of the data set is represented by the size of the data point. d ADME/Tox profile of the three databases calculated with the free server FAF-Drugs. *Based on Lipinski’s Rule of Five

References

    1. Yan XC, Sanders JM, Gao Y-D, Tudor M, Haidle AM, Klein DJ, et al. Augmenting hit identification by virtual screening techniques in small molecule drug discovery. J Chem Inf Model. 2020 doi: 10.1021/acs.jcim.0c00113. - DOI - PubMed
    1. Walters WP, Patrick WW. Virtual chemical libraries. J Med Chem. 2019 doi: 10.1021/acs.jmedchem.8b01048. - DOI - PubMed
    1. Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model. 2012;52:2864–2875. doi: 10.1021/ci300415d. - DOI - PubMed
    1. Humbeck L, Weigang S, Schäfer T, Mutzel P, Koch O. CHIPMUNK: A virtual synthesizable small-molecule library for medicinal chemistry, exploitable for protein-protein interaction modulators. ChemMedChem. 2018;13:532–539. doi: 10.1002/cmdc.201700689. - DOI - PubMed
    1. Lessel U, Wellenzohn B, Lilienthal M, Claussen H. Searching fragment spaces with feature trees. J Chem Inf Model. 2009;49:270–279. doi: 10.1021/ci800272a. - DOI - PubMed