Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 24;7(39):eabj2465.
doi: 10.1126/sciadv.abj2465. Epub 2021 Sep 24.

Exploring and mapping chemical space with molecular assembly trees

Affiliations

Exploring and mapping chemical space with molecular assembly trees

Yu Liu et al. Sci Adv. .

Abstract

The rule-based search of chemical space can generate an almost infinite number of molecules, but exploration of known molecules as a function of the minimum number of steps needed to build up the target graphs promises to uncover new motifs and transformations. Assembly theory is an approach to compare the intrinsic complexity and properties of molecules by the minimum number of steps needed to build up the target graphs. Here, we apply this approach to prebiotic chemistry, gene sequences, plasticizers, and opiates. This allows us to explore molecules connected to the assembly tree, rather than the entire space of molecules possible. Last, by developing a reassembly method, based on assembly trees, we found that in the case of the opiates, a new set of drug candidates could be generated that would not be accessible via conventional fragment-based drug design, thereby demonstrating how this approach might find application in drug discovery.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Representations of an assembly pathway, by taking adenine as an example.
(A) One of the many assembly pathways of adenine (it turns out to be the shortest one, according to our MC algorithm, explained later). The assembly pool (shown inside the dashed boxes) evolves with each assembly step. The colors denote which two assembly building blocks are used to make the new one (note that the color schemes are independent for each step). (B) The key-step representation of the assembly pathway. (C) The joint process for each key assembly step, which is used to work out the multiset representation. (D) The multiset representation of this assembly pathway. Strictly speaking, it should be written as {[1]1, [2]1} where the superscript “1” is the multiplicity of this assembly building block, that is, after canceling out, it appears once on the left-hand side of (C), but for simplicity, we only explicitly write down the multiplicity when it is not 1.
Fig. 2.
Fig. 2.. Two exemplified molecular assembly trees.
(A) The assembly space of adenine and thymine. The shortest assembly pathway for adenine alone is indicated by the blue dashed arrows, while the shortest assembly pathway for thymine alone is indicated by the red dashed arrows. The shortest assembly pathway to make adenine and thymine altogether is the one indicated by the black dashed arrows. (B) A molecular assembly tree for A, G, T, U, and C, which can also be written as {[2, 10, 11, 12, 13]}, whose index is calculated to be 16. Note that, in both (A) and (B), the colors are just used to help the reader recognize the building blocks, and the color schemes are independent; we also omitted the arrows starting from the basic building blocks for a better visualization.
Fig. 3.
Fig. 3.. The assembly tree of a dozen vital biomolecules, including the five nucleobases (A, G, T, C, and U), pyruvate, citrate, d-ribose, NAD+, ADP, ATP, and a symbolic RNA molecule.
Fig. 4.
Fig. 4.. The assembly tree of one hypothetical gene sequence X (as, here, we only deal with one sequence X, it reduces to its shortest assembly pathway in the key-step representation).
Here, we use nucleobases as the basic building blocks rather than chemical bonds in the molecule cases. Thus, we explicitly draw those nucleobases at the bottom for clarification.
Fig. 5.
Fig. 5.. The assembly tree of 10 commonly used plasticizers including BBP, DEHP, DEHA, and others.
For a clearer visualization, all plasticizers are made dimmer than other parts of the tree. The most central structures are highlighted green.
Fig. 6.
Fig. 6.. The assembly tree of nine compounds in the family of opiates and one κ-opioid receptor agonist (salvinorin A).
Some of these opiates are natural (morphine, codeine, thebaine, and papaverine), while others are synthetic (fentanyl, remifentanil, methadone, pethidine, and diamorphine, also known as heroin). For a clearer visualization, all opioids are made dimmer than other parts of the tree.
Fig. 7.
Fig. 7.. Comparison between natural opiates and opiate-like molecules generated using Reassembler.
(A) shows the six opiates used to generate the assembly pools, and (B) shows six new opiate-like molecules generated from those assembly pools. See section S7.3 for more detailed information on more new compounds.
Fig. 8.
Fig. 8.. The comparison of 1000 molecule sets generated from opiate assembly pool (blue) and generated from individual bonds (red).
(A) According to the Tanimoto similarity measure, products of assembly pools were significantly more similar to the parent molecules (opiates) than randomly generated products. (B) QED shows that the assembly products, unlike their random counterparts, showed similar level of drug-likeness to opiates (denoted by gray dotted line). (C) On the basis of logP estimation, assembly products usually had higher logP than opiates (denoted by gray dotted line), while random molecules usually had lower logP.

References

    1. Dobson C. M., Chemical space and biology. Nature 432, 824–828 (2004). - PubMed
    1. Lipinski C., Hopkins A., Navigating chemical space for biology and medicine. Nature 432, 855–861 (2004). - PubMed
    1. Bohacek R. S., McMartin C., Guida W. C., The art and practice of structure-based drug design: A molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996). - PubMed
    1. Kirkpatrick P., Ellis C., Chemical space. Nature 432, 823–823 (2004).
    1. Goto S., Okuno Y., Hattori M., Nishioka T., Kanehisa M., LIGAND: Database of chemical compounds and reactions in biological pathways. Nucleic Acids Res. 30, 402–404 (2002). - PMC - PubMed