Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul;48(7):1311-24.
doi: 10.1021/ci700342h. Epub 2008 Jul 8.

Scaffold topologies. 2. Analysis of chemical databases

Affiliations

Scaffold topologies. 2. Analysis of chemical databases

Michael J Wester et al. J Chem Inf Model. 2008 Jul.

Abstract

We have systematically enumerated graph representations of scaffold topologies for up to eight-ring molecules and four-valence atoms, thus providing coverage of the lower portion of the chemical space of small molecules (Pollock et al. J. Chem. Inf. Model., this issue). Here, we examine scaffold topology distributions for several databases: ChemNavigator and PubChem for commercially available chemicals, the Dictionary of Natural Products, a set of 2742 launched drugs, WOMBAT, a database of medicinal chemistry compounds, and two subsets of PubChem, "actives" and DSSTox comprising toxic substances. We also examined a virtual database of exhaustively enumerated small organic molecules, GDB (Fink et al. Angew. Chem., Int. Ed. 2005, 44, 1504-1508), and we contrast the scaffold topology distribution from these collections to the complete coverage of up to eight-ring molecules. For reasons related, perhaps, to synthetic accessibility and complexity, scaffolds exhibiting six rings or more are poorly represented. Among all collections examined, PubChem has the greatest scaffold topological diversity, whereas GDB is the most limited. More than 50% of all entries (13 000 000+ actual and 13 000 000+ virtual compounds) exhibit only eight distinct topologies, one of which is the nonscaffold topology that represents all treelike structures. However, most of the topologies are represented by a single or very small number of examples. Within topologies, we found that three-way scaffold connections (3-nodes) are much more frequent compared to four-way (4-node) connections. Fused rings have a slightly higher frequency in biologically oriented databases. Scaffold topologies can be the first step toward an efficient coarse-grained classification scheme of the molecules found in chemical databases.

PubMed Disclaimer

Figures

Figure 1
Figure 1
a. (5-methyl-2-propan-2-yl-phenyl) 3,3-dimethyl-2-methylidene-bicyclo[2.2.1]heptane-1-carboxylate [SMILES: CC(C)c1ccc(C)cc1OC(=O)C2(CCC3C2)C(=C)C3(C)C]. b. The scaffold corresponding to this molecule [C1CC2CCC1(C2)COc3ccccc3]. c. The topology corresponding to this scaffold (nodes are numbered as shown). d. A minimal representive of this topology [C1CC1C23CC2C3].
Figure 2
Figure 2
Figure 2 a. All 1–3-ring scaffold topologies and all 4-ring topologies possessing only 3-nodes or only 4-nodes. See Table 2 for further identification Figure 2 b. Examples from the databases examined of molecules that exhibit each 1–3-ring topology and each 4-ring topology possessing only 3-nodes or 4-nodes, corresponding to the topologies in Figure 2. Note that none of the databases examined possessed an example of topology number 17. See Table 2 for further identification.
Figure 2
Figure 2
Figure 2 a. All 1–3-ring scaffold topologies and all 4-ring topologies possessing only 3-nodes or only 4-nodes. See Table 2 for further identification Figure 2 b. Examples from the databases examined of molecules that exhibit each 1–3-ring topology and each 4-ring topology possessing only 3-nodes or 4-nodes, corresponding to the topologies in Figure 2. Note that none of the databases examined possessed an example of topology number 17. See Table 2 for further identification.
Figure 3
Figure 3
The population percentages in the indicated databases with respect to the total database population for the number of rings per scaffold.
Figure 4
Figure 4
Populations of scaffolds in the ChemNavigator database as a function of the number of 3- and 4-nodes, N3 and N4, and ordered, using connected stems of the same color, by the number of independent rings r. 5 outliers (scaffolds with N3 > 50) have been excluded to make the main population trends of the graph easier to see.
Figure 5
Figure 5
The percentage frequencies of the first 33 scaffold topologies of Figure 2 in the indicated databases. The entry labeled zero indicates the database percentages of structures that do not contain rings. The dashed lines in the top graph divide the results into sets of topologies possessing 0, 1, 2 or 3 rings, respectively. The bottom graph displays the frequencies for 4-ring topologies containing only 3-nodes. Note that the vertical scales in the two graphs are different.
Figure 6
Figure 6
Figure 6 a. The most frequent topologies present in the databases examined, numbered (in boldface) by their rank in the merged database. The second value for each entry is the topology number, 1–33 and 86–89 of which are shown in Figure 2a Figure 6 b. Examples from the databases examined of the most frequent topologies present, numbered by their rank in the merged database (compare with Figure 6a).
Figure 6
Figure 6
Figure 6 a. The most frequent topologies present in the databases examined, numbered (in boldface) by their rank in the merged database. The second value for each entry is the topology number, 1–33 and 86–89 of which are shown in Figure 2a Figure 6 b. Examples from the databases examined of the most frequent topologies present, numbered by their rank in the merged database (compare with Figure 6a).
Figure 7
Figure 7
The average number of atoms comprising the scaffolds in the indicated databases that are members of the given ranked topologies (see Figure 6a). Minimum refers to the number of nodes needed to produce a minimal representative of the topology (see Figure 1d). The values for the merged database are the total bar heights.

References

    1. Pollock S, Coutsias EA, Wester MJ, Oprea TI. Scaffold Topologies I: Exhaustive Enumeration up to 8 Rings. J. Chem. Info. Model., submitted (accompanying this paper) - PMC - PubMed
    1. Fink T, Bruggesser H, Reymond J-L. Virtual Exploration of the Small-Molecule Chemical Universe below 160 Daltons. Angew. Chem. Int. Ed. 2005;44:1504–1508. - PubMed
    1. de Laet A, Hehenkamp JJJ, Wife RL. Finding Drug Candidates in Virtual and Lost/Emerging Chemistry. J. Heterocyclic Chem. 2000;37:669–674.
    1. Hehenkamp JJJ, de Laet RC, Parlevliet FJ, Verheij HJ, Wife RL. Navigating the real and virtual chemical worlds. In: Collier H, editor. Proceedings of the 2000 Chemical Information Conference. France: Infonortics: Annecy; 2000.
    1. Oprea TI, Gottfries J. Chemography: The Art of Chemical Space Navigation Comb. J. Chem. 2001;3:157–166. - PubMed

Publication types

MeSH terms