Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov;41(11):e2200116.
doi: 10.1002/minf.202200116. Epub 2022 Aug 23.

Chemical Multiverse: An Expanded View of Chemical Space

Affiliations
Review

Chemical Multiverse: An Expanded View of Chemical Space

José L Medina-Franco et al. Mol Inform. 2022 Nov.

Abstract

Technological advances and practical applications of the chemical space concept in drug discovery, natural product research, and other research areas have attracted the scientific community's attention. The large- and ultra-large chemical spaces are associated with the significant increase in the number of compounds that can potentially be made and exist and the increasing number of experimental and calculated descriptors, that are emerging that encode the molecular structure and/or property aspects of the molecules. Due to the importance and continued evolution of compound libraries, herein, we discuss definitions proposed in the literature for chemical space and emphasize the convenience, discussed in the literature to use complementary descriptors to obtain a comprehensive view of the chemical space of compound data sets. In this regard, we introduce the term chemical multiverse to refer to the comprehensive analysis of compound data sets through several chemical spaces, each defined by a different set of chemical representations. The chemical multiverse is contrasted with a related idea: consensus chemical space.

Keywords: chemical multiverse; chemical space; drug discovery; machine learning; molecular representation; structure-property relationships; ultra-large chemical library; visualization.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1
Figure 1
Schematic view of the chemical space. Each molecule in the compound data set is represented by M descriptors that lead to a feature vector space. The group of n molecules represented with M descriptors form a “chemical space table” that can be represented using different visualization methods. Mapping a property (e. g., biological activity) to the “chemical space table” or the visual representation gives rise to a chemogenomic space that is the basis to do structure‐property (activity) relationships.
Figure 2
Figure 2
Schematic and general representation of the concept of the chemical multiverse. The chemical multiverse of the same data set with n molecules would be composed of several (shown in figure three for illustrative purpose) alternative chemical spaces, each one defined by a different set of descriptors. The geometric figures represent the encoding of the structures using different descriptors: e. g., drug‐likeness properties (blue triangles), fingerprints (orange cylinders), and constitutional descriptors (such as ring counts, carbon, nitrogen, oxygen, and bridgehead atoms) (pink cubes). Depending on the study‘s goals, a chemical multiverse could contain as many chemical spaces as needed. Each chemical space (in the middle section: “chemical space tables”) could be subject to different 2D/3D visual representations of the chemical space.
Figure 3
Figure 3
General comparison of a chemical multiverse with consensus chemical space. While a chemical multiverse of a compound data set is a set of alternative chemical spaces (shown in figure only two, for illustration purposes), a consensus chemical space is the combination of the alternative chemical spaces to yield one single chemical space that is the result of data fusion or combination of the descriptors. The grids with blue triangles and orange cylinders encode two different chemical spaces described by, for instance, drug‐likeness and ECFP descriptors, respectively. In this schematic example, the green hexagons represent the fusion or combination of the two descriptors to lead to a new but single consensus chemical space.
Figure 4
Figure 4
Example of the visual representation of chemical multiverse of four compound data sets: A) drug‐like (2,403 compounds); B) protein‐protein interaction inhibitors (2,227 compounds); C) anti‐MRSA peptides (165); D) natural products (285 compounds from BIOFACQUIM database); E) food chemicals (21,319 compounds from FooDB). The visual representations were obtained from the t‐SNE module implemented in KNIME. The chemical multiverse of each set is compared with a consensus representation of the chemical space obtained from averaging (i.e, average fusion rule) the coordinates of the data points. The upper part of this figure shows examples of representative chemical structures of each data set, each structure is represented by a yellow shape.

Similar articles

Cited by

References

    1. Maggiora G. M., in Foodinformatics: Applications of Chemical Information to Food Chemistry (Eds.: Martinez-Mayorga K., Medina-Franco J. L.), Springer International Publishing, Cham, 2014, pp. 1–81.
    1. Virshup A. M., Contreras-García J., Wipf P., Yang W., Beratan D. N., J. Am. Chem. Soc. 2013, 135, 7296–7303. - PMC - PubMed
    1. Flores-Padilla E. A., Juárez-Mercado K. E., Naveja J. J., Kim T. D., Alain Miranda-Quintana R., Medina-Franco J. L., Mol. Inf. 2021, 41, e2100285. - PubMed
    1. Prado-Romero D. L., Medina-Franco J. L., ACS Omega 2021, 6, 22478–22486. - PMC - PubMed
    1. Yoshimori A., Miljković F., Bajorath J., Molecules 2022, 27, 570. - PMC - PubMed

Publication types

Substances

LinkOut - more resources