Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 25;64(6):2008-2020.
doi: 10.1021/acs.jcim.4c00147. Epub 2024 Mar 11.

Enhanced Calculation of Property Distributions in Chemical Fragment Spaces

Affiliations

Enhanced Calculation of Property Distributions in Chemical Fragment Spaces

Justin Lübbers et al. J Chem Inf Model. .

Abstract

Chemical fragment spaces exceed traditional virtual compound libraries by orders of magnitude, making them ideal search spaces for drug design projects. However, due to their immense size, they are not compatible with traditional analysis and search algorithms that rely on the enumeration of molecules. In this paper, we present SpaceProp2, an evolution of the SpaceProp algorithm, which enables the calculation of exact property distributions for chemical fragment spaces without enumerating them. We extend the original algorithm by the capabilities to compute distributions for the TPSA, the number of rotatable bonds, and the occurrence of user-defined molecular structures in the form of SMARTS patterns. Furthermore, SpaceProp2 produces example molecules for every property bin, enabling a detailed interpretation of the distributions. We demonstrate SpaceProp2 on six established make-on-demand chemical fragment spaces as well as BICLAIM, the in-house fragment space of Boehringer Ingelheim. The possibility to search multiple SMARTS patterns simultaneously as well as the produced example molecules offers previously impossible insights into the composition of these vast combinatorial molecule collections, making it an ideal tool for the analysis and design of chemical fragment spaces.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): The authors declare the following competing financial interest(s): M.R., as a shareholder of BioSolveIT GmbH, declares a potential financial interest in the event that the SpaceProp software is licensed for a fee to nonacademic institutions in the future.

Figures

Figure 1
Figure 1
Visualization of topological fragment space. (a) An example reaction consisting of a triazole ring closure and an amide coupling. The pink bonds mark the newly formed connections between the reactants. The reaction was taken from the eXplore cookbook (rxn509), the official documentation for the chemical fragment space eXplore, made by eMolecules and BioSolveIT. (b) The corresponding topology graph. The boxes represent topology nodes that contain fragments, including the fragments from the shown reaction. Again, we marked the bonds where the fragments will be connected as pink. The two dashed connections between the left and the middle topology node represent two aromatic bonds that form the triazole ring structure. The connection between the other nodes represents the single amide bond formed between the reactants.
Figure 2
Figure 2
SpaceProp concept. (a) Topological fragment space. The fragments are represented by two connected parts. The colored circle represents the IPC of the fragment. The gray shape represents the parts of the fragment, for which no property value can be computed. Note that in node B, two distinct fragments share the same IPC value, colored in yellow. (b) The fragments grouped by their boundary information, which contains all information necessary to characterize the uncalculable parts of the fragments (gray shapes). In (c), we show the calculation of EPC values by combining fragments from each group in each node. Now the EPC values can be calculated, as indicated by their coloring. The final property distribution in (d) shows all occurring property values for all six possible product molecules. Each value consists of two IPC values and the corresponding EPC value.
Figure 3
Figure 3
a) Two electron localizations of a 1,2,4-triazole ring. (b) Two fragments that make up the 1,2,4-triazole ring.
Figure 4
Figure 4
(a) We depict the enumeration of all subpatterns of a query SMARTS pattern. All open bonds are saturated with linker-matching nodes displayed as triangles. The outgoing bonds are marked in pink to emphasize that they have to be matched to the outgoing linker bonds of the fragments. (b) The depicted topological fragment space models a triazole ring closure reaction from the eXplore cookbook (rxn301). The fragments are grouped by their boundary information, which contains the query subpatterns that match the fragments. (c) Groups A.1 and B.2 contain two subpatterns that together make up the complete query structure. Therefore, all product molecules from the two groups must contain the query structure as a crossing pattern, as demonstrated.
Figure 5
Figure 5
Four diagrams show the distribution of the TPSA (left) and number of rotatable bonds (right) for all considered fragment spaces on a logarithmic scale. The two diagrams on the top show the full value ranges for both properties. For a better interpretation of the logarithmic scale, the dot on each line indicates the 99% mark, meaning that less than 1% of products lie to the right of the marked value. The two diagrams on the bottom show the distributions within the value ranges proposed by Veber et al. in more detail.
Figure 6
Figure 6
Comparison of the fragment spaces with regard to the number of molecules within the thresholds for potential oral bioavailability of compounds as proposed by Veber et al. The bars display the absolute number of products on a logarithmic scale while the given percentages show the relative number of products with regard to the total sizes of the fragment spaces.
Figure 7
Figure 7
The most commonly occurring potential electrophilic warhead structures in the regarded fragment spaces, presented in relative (top) and absolute counts (bottom).
Figure 8
Figure 8
Histogram counting how many products of the regarded fragment spaces contain none, one, or multiple query patterns. Product counts are given on a logarithmic scale. The dot on each line again indicates the 99% mark, such that a maximum of 1% of compounds in a fragment space contain more than the marked number of query patterns.
Figure 9
Figure 9
Example molecule from REALSpace that contains seven potential electrophilic warhead structures. The structures are shown on the right and their occurrence is highlighted in the example molecule.
Figure 10
Figure 10
α,β-unsaturated amide pattern matching in a lactam structure.
Figure 11
Figure 11
Example compounds from REALSpace containing cyclic and noncyclic α,β-unsaturated amide structures as well as both at the same time.

References

    1. Enamine REAL Space. https://enamine.net/compound-collections/real-compounds/real-space-navig... (accessed June 26, 2023).
    1. GalaXi Space. https://www.labnetwork.com/frontend-app/p/#!/library/virtual (accessed June 26, 2023)
    1. CHEMriya Space. https://www.otavachemicals.com/products/chemriya (accessed June 26, 2023)
    1. eXplore Space. https://marketing.emolecules.com/explore (accessed June 26, 2023)
    1. Lessel U.; Wellenzohn B.; Lilienthal M.; Claussen H. Searching Fragment Spaces with Feature Trees. J. Chem. Inf. Model. 2009, 49, 270–279. 10.1021/ci800272a. - DOI - PubMed

Publication types