Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 May 26;121(10):5633-5670.
doi: 10.1021/acs.chemrev.0c00901. Epub 2021 May 12.

Quantum Chemistry Calculations for Metabolomics

Affiliations
Review

Quantum Chemistry Calculations for Metabolomics

Ricardo M Borges et al. Chem Rev. .

Abstract

A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials ("standards"), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for "standards-free" identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Systems biology paradigm. Systems biology studies employ omics approaches to comprehensively identify and quantify the functional units of the system under study. One or more omics approach is used to perform measurements of genes, transcripts, proteins, and metabolites, the data are analyzed and integrated, and computational models are used to interpret the results, often with the goal of obtaining a predictive understanding of the system to then manipulate it in a directed fashion. Reproduced with permission from ref (8). Copyright 2007 AAAS.
Figure 2
Figure 2
Omics. The approaches (and philosophies) for comprehensively identifying and quantifying genes, transcripts, proteins, and metabolites are termed genomics, transcriptomics, proteomics, and metabolomics, respectively. Lipidomics is the subdiscipline of metabolomics that addresses the measurement of polar and nonpolar lipids. Glycomics is the omics devoted to the comprehensive measurement of free and protein-bound glycans (as well as glycolipids, i.e., lipid-bound glycans). The exposome includes all endogenous and exogenous exposures and unites transcriptomics, proteomics, metabolomics, lipidomics, and glycomics and includes measurement of anthropogenic molecules. Modified with permission from ref (28) under the Attribution-NonCommercial-No Derivatives 4.0 Unported License (http://creativecommons.org/licenses/by-nc-nd/4.0). Modified with permission from ref (29). Copyright 2016 Springer Nature.
Figure 3
Figure 3
Omics publication trends 1999–2019. The numbers of publications including genomics, transcriptomics, proteomics, metabolomics, lipidomics, and glycomics approaches have steadily increased in the last two decades, linearly from 1999 to 2009 and exponentially thereafter. Results culminated from PubMed keyword searches of “genomics,″ “transcriptomics,” “proteomics,” “metabolomics” (and “metabonomics”), “lipidomics,” and “glycomics” and limited to appearance in publication title or abstract. Genomics and transcriptomics publications are combined because these approaches rely on sequencing technologies. Metabolomics, lipidomics, and glycomics publications are also combined because comprehensive analysis of these molecular types typically rely on mass spectrometry (MS) approaches but often involve other technology (such as nuclear magnetic resonance spectroscopy).
Figure 4
Figure 4
Quantum chemistry methods (upper right) are considered highly accurate but also highly expensive compared to empirical potential-based simulation methods (lower left). Methods that combine physics-based principles with empirical knowledge, such as semiempirical models, density functional theory, and future deep-learning-based methods are promising for improving accuracy without increasing computational cost.
Figure 5
Figure 5
NMR chemical shift calculation protocol developed by Willoughby et al.(204)
Figure 6
Figure 6
NMR chemical shift calculation protocol developed by Xin et al.(239) Reproduced with permission from ref (239). Copyright 2017 American Chemical Society.
Figure 7
Figure 7
Das and Merz protocol to calculate NMR chemical shift. This protocol includes FF, ML-QM, and standard QM methods to improve efficiency of NMR computation technique with low computational cost.
Figure 8
Figure 8
BMRB ID, no. of atoms, no. of rotatable bonds, FF generated conformation, ANI optimized conformations, and no. of clusters are reported for folate molecule.
Figure 9
Figure 9
Plots of the differences between the calculated and experimental 1H and 13C NMR chemical shifts of folate. Shielding constants were computed at the B3LYP/6311G+(2d, p) level of theory and converted to linear scaled reference chemical shifts. Values of chemical shift differences are given in ppm.
Figure 10
Figure 10
Drug ondansetron and the three hydroxylated metabolites identified in Dear et al. 2010. This study was a first demonstration of the use of quantum chemical-based theoretical calculations of CCS for small-molecule identification. The four isomers had indistinguishable MS/MS fragmentation spectra due to the hydroxyl moiety being located on the unfragmented benzene ring, thus, the matching of calculated CCS to the experimental CCS distributions was the sole distinguishing dimension of data for identification. Reproduced with permission from ref (262). Copyright 2010 Wiley and Sons, Ltd.
Figure 11
Figure 11
Computed IR spectra (colored traces, a–d) of potential candidate structures resulting from a database search for an unknown feature at m/z 100.0757 compared to the IRIS spectrum of the unknown feature (black trace) from the patient sample. (e) Compares the IR spectrum of the reference compound N-methyl-2-pyrrolidinone identified by the match found for the predicted spectrum. Reproduced with permission from ref (269). Copyright 2020 Elsevier.
Figure 12
Figure 12
Ion mobility data (left) and cryogenic infrared vibrational spectra of individual conformer families of bradykinin [M + 2H]2+, consistent with the trans-Pro2/trans-Pro3 isomer geometry. Reproduced with permission from ref (261). Copyright 2018 American Chemical Society.
Figure 13
Figure 13
MS-based compound identification paradigm. Mass spectrometrists mostly deal with unknown mass spectra that need structural assignments (spectrum-to-structure). New algorithms to understand fragmentations need to be developed with the help of the quantum chemical community. Since the development of Grimme’s QCEIMS method in 2013, it is now possible to predict 70 eV mass spectra using Born–Oppenheimer ab initio molecular dynamics directly from structures (structure-to-spectrum). Large chemical databases with millions of compounds can be used to predict high-quality theoretical mass spectra.
Figure 14
Figure 14
The 70 eV mass spectra of anisole calculated with QCEIMS. (left) The in silico spectrum calculated with the OM2 semiempirical function, while (right) shows the GFN1-xTB Hamiltonian. Both algorithms underestimate the peak at m/z 78 and overestimate the peak at m/z 93. This leads to a similarity score of 569 for OM2 and a somewhat higher match score of 660 for the new GFN1-xTB method. Further methodic improvements have to be made to increase the quality of the simulated spectra.
Figure 15
Figure 15
Collision-induced dissociation process (CID-MS/MS) in a collision cell. Ions formed by, e.g., electrospray ionization enter the mass spectrometer and pass a neutral curtain gas (He, Ne, Ar). Once energy is transferred from the collision, molecules can dissociate or rearrange. Ions are further accelerated toward the detectors and registered as specific m/z signals. The collision gas and neutral reaction products are removed by the vacuum pumps of the instrument.
Figure 16
Figure 16
Effects of a labile carboxylic acid proton on 1H and 13C chemicals shifts. Reproduced with permission from MDPI (2017, CC-BY 4.0 license).
Figure 17
Figure 17
Structural reassignment of the natural product, tristichone C by Kutateladze and Reddy using their DU8+ method. Reprinted with permission from ref (439). Copyright 2017 American Chemical Society.

References

    1. Aebersold R.; Mann M. Mass spectrometry-based proteomics. Nature 2003, 422, 198–207. 10.1038/nature01511. - DOI - PubMed
    1. Giani A. M.; Gallo G. R.; Gianfranceschi L.; Formenti G. Long walk to genomics: History and current approaches to genome sequencing and assembly. Comput. Struct. Biotechnol. J. 2020, 18, 9–19. 10.1016/j.csbj.2019.11.002. - DOI - PMC - PubMed
    1. Heather J. M.; Chain B. The sequence of sequencers: The history of sequencing DNA. Genomics 2016, 107, 1–8. 10.1016/j.ygeno.2015.11.003. - DOI - PMC - PubMed
    1. Hashimoto Y.; Greco T. M.; Cristea I. M. Contribution of mass spectrometry-based proteomics to discoveries in developmental biology. Adv. Exp. Med. Biol. 2019, 1140, 143–154. 10.1007/978-3-030-15950-4_8. - DOI - PubMed
    1. Song Y.; Xu X.; Wang W.; Tian T.; Zhu Z.; Yang C. Single cell transcriptomics: moving towards multi-omics. Analyst 2019, 144, 3172–3189. 10.1039/C8AN01852A. - DOI - PubMed

Publication types