Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 18;10(5):1054-1064.
doi: 10.1021/acscentsci.4c00120. eCollection 2024 May 22.

Investigating and Quantifying Molecular Complexity Using Assembly Theory and Spectroscopy

Affiliations

Investigating and Quantifying Molecular Complexity Using Assembly Theory and Spectroscopy

Michael Jirasek et al. ACS Cent Sci. .

Abstract

Current approaches to evaluate molecular complexity use algorithmic complexity, rooted in computer science, and thus are not experimentally measurable. Directly evaluating molecular complexity could be used to study directed vs undirected processes in the creation of molecules, with potential applications in drug discovery, the origin of life, and artificial life. Assembly theory has been developed to quantify the complexity of a molecule by finding the shortest path to construct the molecule from building blocks, revealing its molecular assembly index (MA). In this study, we present an approach to rapidly infer the MA of molecules from spectroscopic measurements. We demonstrate that the MA can be experimentally measured by using three independent techniques: nuclear magnetic resonance (NMR), tandem mass spectrometry (MS/MS), and infrared spectroscopy (IR). By identifying and analyzing the number of absorbances in IR spectra, carbon resonances in NMR, or molecular fragments in tandem MS, the MA of an unknown molecule can be reliably estimated. This represents the first experimentally quantifiable approach to determining molecular assembly. This paves the way to use experimental techniques to explore the evolution of complex molecules as well as a unique marker of where an evolutionary process has been operating.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Molecular assembly of 5-aminoisophathalic acid. (A) Molecular assembly pathway of 5-aminoisophathalic with a total of 7 steps. The various chemical bonds are considered as fundamental building blocks (shown in red), and the substructures (shown in blue) along the pathways constitute the assembly pool. (B–D) Experimental NMR, IR, and MS2 spectra of 5-aminoisophathalic acid highlighting different features of the molecule from which the molecular constraints and the MA can be inferred (see Figures 3, 4, and 6).
Figure 2
Figure 2
(a) The general structure of the Go assembly algorithm with a pool of workers extending pathways by queuing the pathways to be checked as jobs. Some features are omitted for brevity, such as branch and bound methods to improve efficiency. (b) A sequence of assembly pathways as processed by the Go algorithm. The top pathway is the starting pathway for the molecule shown, and each subsequent pathway is extended from the pathway above. Pathways are generally extended in multiple ways, and only one such sequence of extensions is shown here. (c) An example of MA values found over time for primisulfuron-methyl, run to completion, and approximated by stopping early at various stages prior. The new algorithm found pathways at the correct MA of 22 by 10 s, significantly before completion at ∼2064 s. The red circle shows split branch algorithm performance on the same molecule. The naïve MA (blue hexagon) is calculated trivially for pathways in which one bond is added at a time (placed illustratively at 10–3 s, as 0 s cannot be represented on the logarithmic scale).
Figure 3
Figure 3
Inferring molecular assembly from infrared spectroscopy. (A) xTB-calculated IR spectrum of 5-aminoisophthalic acid with highlighted fingerprint region (400–1500 cm–1). (B) Example of the six most intense vibrational bands in the fingerprint region, demonstrating its collective-motion nature. (C) Molecular assembly vs IR-inferred molecular assembly estimated from the number of IR peaks in the fingerprint region (400–1500 cm–1) based on xTB calculation on 10,000 molecules (see eq 1). Correlation between the predicted and expected molecular assemblies is 0.86. (D) Molecular assembly vs IR-inferred molecular assembly estimated from the number of IR peaks in the fingerprint region (400–1500 cm–1) based on the experimental measurement on 99 molecules (see eq 2). Correlation between the predicted and expected molecular assemblies is 0.75.
Figure 4
Figure 4
Inferring molecular complexity from 13C NMR spectra. (A and B) Predicted 13C NMR spectrum of 5-aminoisophathalic acid and quinine, with highlighted different types of carbons. (C) Molecular assembly vs NMR-inferred molecular assembly estimated from the number of different types of carbons (see eq 3) based on NMRshiftDB calculation on 10,000 molecules. The correlation between the predicted and expected molecular assembly is 0.87. (D) Molecular assembly vs NMR-inferred molecular assembly was estimated from the number of different types of carbons experimentally on 101 molecules, using the same model as in the theoretical set. The correlation between the predicted and expected molecular assemblies is 0.81.
Figure 5
Figure 5
(a) Distribution of MA against molecular mass, based on 16.5M molecules sampled from the PubChem database. The upper limit is linear and the lower limit is approximately logarithmic in nature. The theoretical MA values were calculated with a 10 s cutoff. (b) Sample illustrating features of molecules in the MA/MW ranges. The characteristic features of lower MA molecules at a given molecular weight include the presence of periodic units, heavy elements, or both. The high MA molecules usually comprise higher heterogeneity with atoms of similar atomic weights.
Figure 6
Figure 6
(a) The recursive algorithm for estimating molecular assembly from tandem mass spectrometry data. (b) Application of the recursive MA algorithm to resolve MA differences between molecules with similar molecular mass. Example of reduced MA based on the presence of heavy elements (bromine and iodine); example of reduced MA based on the presence of repeated structural features. (c) Molecular assembly vs recursive-MSn-inferred molecular assembly. Blue points represent the data set of 71 molecules sampled across the MA values. The orange points are 30 molecules, specifically chosen to cover a large range of MA, but within a very narrow molecular weight (300 ± 5 g/mol). The correlation coefficient is 0.73.
Figure 7
Figure 7
(A) Molecular assembly vs combined IR- and NMR-inferred molecular assembly (using weights of 0.55 and 0.45 from NMR and IR, respectively) based on 10,000 calculated spectra showing an increased correlation of 0.90. (B) Molecular assembly vs combined IR- and NMR-inferred molecular assembly (using weights of 0.7 and 0.3 from NMR and IR, respectively) based on 54 experimental spectra showing an increased (relative to the individual components) correlation of 0.89. (C) Molecular assembly vs individual and combined IR-, NMR-, and MS-inferred molecular assembly based on the 54 molecules. Note that due to experimental limitations, multilevel MS fragmentation data were available for only 10 compounds; for the rest the MS part of the MA inference was performed based solely on the exact ion mass approximation. The correlation coefficient for the combined techniques is 0.88.

References

    1. Mikulak-Klucznik B.; Gołębiowska P.; Bayly A. A.; Popik O.; Klucznik T.; Szymkuć S.; Gajewska E. P.; Dittwald P.; Staszewska-Krajewska O.; Beker W.; et al. Computational Planning of the Synthesis of Complex Natural Products. Nature 2020, 588 (7836), 83–88. 10.1038/s41586-020-2855-y. - DOI - PubMed
    1. Pilkington A Chemometric Analysis of Deep-Sea Natural Products. Molecules 2019, 24 (21), 3942.10.3390/molecules24213942. - DOI - PMC - PubMed
    1. Lyu J.; Wang S.; Balius T. E.; Singh I.; Levit A.; Moroz Y. S.; O’Meara M. J.; Che T.; Algaa E.; Tolmachova K.; et al. Ultra-Large Library Docking for Discovering New Chemotypes. Nature 2019, 566 (7743), 224–229. 10.1038/s41586-019-0917-9. - DOI - PMC - PubMed
    1. Adams K.; Coley C. W.. Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design. 2022,10.48550/ARXIV.2210.04893. - DOI
    1. Böttcher T. From Molecules to Life: Quantifying the Complexity of Chemical and Biological Systems in the Universe. J. Mol. Evol. 2018, 86 (1), 1–10. 10.1007/s00239-017-9824-6. - DOI - PMC - PubMed

LinkOut - more resources