Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 15;135(19):7296-303.
doi: 10.1021/ja401184g. Epub 2013 May 2.

Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds

Affiliations

Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds

Aaron M Virshup et al. J Am Chem Soc. .

Abstract

The "small molecule universe" (SMU), the set of all synthetically feasible organic molecules of 500 Da molecular weight or less, is estimated to contain over 10(60) structures, making exhaustive searches for structures of interest impractical. Here, we describe the construction of a "representative universal library" spanning the SMU that samples the full extent of feasible small molecule chemistries. This library was generated using the newly developed Algorithm for Chemical Space Exploration with Stochastic Search (ACSESS). ACSESS makes two important contributions to chemical space exploration: it allows the systematic search of the unexplored regions of the small molecule universe, and it facilitates the mining of chemical libraries that do not yet exist, providing a near-infinite source of diverse novel compounds.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The ACSESS procedure
allows the construction of a representative universal library in an arbitrary chemical space. A, a library of initial molecules is expanded using chemical mutations and crossover; compounds outside the target chemical space are discarded; and a maximally diverse subset of the remaining molecules is selected. This process is repeated until the diversity of the set converges. B, chemical structure modifications, which include addition or deletion of terminal atoms (1), bond order modifications (2), addition or deletion of in-chain atoms (3), removal or addition of cyclic bonds (4), and modifications of atom type (5). C, an example of a chemical space trajectory. The final compound occupies unexplored chemical space in the SMU.
Figure 2
Figure 2. Comparison to existing libraries
The SMU-RUL (black), ZINC natural product library (green), ZINC drug library (orange), and druglike compounds in PubChem (purple) are shown. A, compound locations along the first two principal components of the SMU-RUL library. B–I, histograms of physiochemical properties for the four libraries; y-axes correspond to normalized compound counts within each library. The properties include B, estimated LogP (XLogP); C, molecular weight (MW); D, topologically estimated polar surface area (TPSA); E, synthetic accessibility score (SAScore); F–G, number of hydrogen-bond donors and acceptors; H, ratio of non-carbon heavy atoms to carbon atoms; and I, number of rotatable bonds. Compared to the PubChem database, molecules in the SMU-RUL are, on average, more polar and have a larger molecular weight. Lower synthetic accessibility scores (E) for SMU-RUL compounds are expected because of their novelty and dissimilarity to known compounds.
Figure 3
Figure 3. Map of the small molecule universe
A 300×300 toroidal self-organizing map map was created using normalized autocorrelation descriptors of SMU-RUL compounds. For clarity, the map is divided into 36 labeled sections (AI, BII, etc.), each containing a 50×50 grid of neurons. A, number of PubChem compounds assigned to a neuron; white indicates neurons which are unoccupied by any PubChem compounds (84% of total). The PubChem compounds are highly clustered to a relatively small region of chemical space; 98% are assigned to only 2% of the neurons. The black circle in region EI encompasses the positions of all GDB13 compounds. B–D, molecular properties; each neuron is colored by the median value of its SMU-RUL compounds.
Chart 1
Chart 1. SMU-RUL compounds from unexplored chemical space
Each compound shown here was selected from a SOM map neuron unoccupied by any PubChem compounds, and was among the most synthetically accessible compounds assigned to the neuron. Letters/numerals refer to the regions shown in figure 3. The stereochemical assignments shown reflect the generated 3D conformations, which are shown as ball-and-stick models in the SI.

References

    1. Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering. The National Academies Press; Washington, D.C: 2003. - PubMed
    1. Sauer WHB, Schwarz MK. J Chem Inf Comp Sci. 2003;43:987. - PubMed
    1. Schreiber SL. Nature. 2009;457:153. - PubMed
    1. Dandapani S, Marcaurelle LA. Nature Chem Bio. 2010;6:861. - PubMed
    1. Bohacek RS, McMartin C, Guida WC. Med Res Rev. 1996;16:3. - PubMed

Publication types

Substances