. 2018 Jan;15(1):53-56.

doi: 10.1038/nmeth.4512. Epub 2017 Nov 27.

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics

Zijuan Lai^{1

2}, Hiroshi Tsugawa^{3

4}, Gert Wohlgemuth¹, Sajjan Mehta¹, Matthew Mueller¹, Yuxuan Zheng², Atsushi Ogiwara⁵, John Meissen¹, Megan Showalter¹, Kohei Takeuchi⁶, Tobias Kind¹, Peter Beal², Masanori Arita^{3

7}, Oliver Fiehn^{1

8}

Affiliations

¹ West Coast Metabolomics Center, UC Davis, Davis, California, USA.
² Department of Chemistry, UC Davis, Davis, California, USA.
³ RIKEN Center for Sustainable Resource Science, Yokohama, Japan.
⁴ RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
⁵ Reifycs Inc., Tokyo, Japan.
⁶ Perfume Development Research Laboratory, Kao Corporation, Tokyo, Japan.
⁷ National Institute of Genetics, Mishima, Japan.
⁸ Department of Biochemistry, King Abdulaziz University, Jeddah, Saudi Arabia.

PMID: 29176591
PMCID: PMC6358022
DOI: 10.1038/nmeth.4512

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics

Zijuan Lai et al. Nat Methods. 2018 Jan.

. 2018 Jan;15(1):53-56.

doi: 10.1038/nmeth.4512. Epub 2017 Nov 27.

Authors

Affiliations

¹ West Coast Metabolomics Center, UC Davis, Davis, California, USA.
² Department of Chemistry, UC Davis, Davis, California, USA.
³ RIKEN Center for Sustainable Resource Science, Yokohama, Japan.
⁴ RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
⁵ Reifycs Inc., Tokyo, Japan.
⁶ Perfume Development Research Laboratory, Kao Corporation, Tokyo, Japan.
⁷ National Institute of Genetics, Mishima, Japan.
⁸ Department of Biochemistry, King Abdulaziz University, Jeddah, Saudi Arabia.

PMID: 29176591
PMCID: PMC6358022
DOI: 10.1038/nmeth.4512

Abstract

Novel metabolites distinct from canonical pathways can be identified through the integration of three cheminformatics tools: BinVestigate, which queries the BinBase gas chromatography-mass spectrometry (GC-MS) metabolome database to match unknowns with biological metadata across over 110,000 samples; MS-DIAL 2.0, a software tool for chromatographic deconvolution of high-resolution GC-MS or liquid chromatography-mass spectrometry (LC-MS); and MS-FINDER 2.0, a structure-elucidation program that uses a combination of 14 metabolome databases in addition to an enzyme promiscuity library. We showcase our workflow by annotating N-methyl-uridine monophosphate (UMP), lysomonogalactosyl-monopalmitin, N-methylalanine, and two propofol derivatives.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare competing financial interests. Atsushi Ogiwara is a developer in Reifycs Inc., which provides the ABF converter of mass spectral data for free at http://www.reifycs.com/AbfConverter/.

Figures

**Figure 1. Summary for functional and structural identification of unknown metabolites**
(a) BinVestigate to search unknown compounds for metabolomics study metadata and (nominal) EI-MS spectra in BinBase, with results shown as sunburst diagrams to illustrate the biological origin (species, organs, cell types) of unknowns. (b) MS-DIAL 2.0 for universal GC-MS or LC-MS/MS deconvolution with high resolution (HR) mass spectrometry analytics to obtain the deconvoluted HR-MS spectra of unknowns needed for compound identification. (c) MS-FINDER 2.0 for universal GC-EI-MS and LC-ESI-MS/MS spectral interpretation to annotate unknowns in combination with the enzyme promiscuity structure database (MINE), resulted in the discovery of biologically significant chemical structure. The tools are fully connected in MS-DIAL. Each tool is also available as standalone program.

**Figure 2. Metabolomic meta-analysis for origin exploration by BinVestigate**
Bin IDs were queried in over 114,000 samples to show cross-study specificity and relevance of unknown BinBase ID 160842 (left) and unknown BinBase ID 106699 (right). In the sunburst diagrams, the area of the circular sector for each organ (inner cycle) or species (outer cycle) was mathematically determined by the average signal intensity of the unknown compound when present in such origin. Bin ID, Fiehn RI, Kovats RI, number of annotation records, and conclusion of biological significance for the five unknowns discussed in this paper were summarized in the table.

**Figure 3. Identification of N-methyl-UMP by MS-DIAL 2.0 and MS-FINDER 2.0**
High resolution GC-MS analytics was first used for structure elucidation (left), then LC-MS/MS was applied as additional evidence line to validate the discovery (right). (a) Spectral deconvolution: fragment ions and molecular adduct ions of BinBase ID 106699 were deconvoluted and confirmed through MS-DIAL 2.0. (b) Formula prediction and validation: C₁₀H₁₅N₂O₉P was scored and ranked at 1^st in MS-FINDER 2.0 based on mass errors, isotope ratio errors, and subformula assignments. For GC-MS flow, chemical ionization data with different derivatization methods (MSTFA vs. MSTFAd9) were obtained to verify the formula as well as to yield the number of acidic protons; for LC-MS flow, between theoretical values and experimental values, the mass errors were only 1 mDa, and the isotopic ratio errors were within 1%. (c) Structure prediction, validation, and identification: structure candidates were retrieved from MINE DB in addition to internal metabolome database, and *in silico* fragmented based on hydrogen rearrangement rules, bond dissociation energy, and comprehensive fragmentation rule library (including GC-EI-MS and LC-ESI-MS/MS). N-methyl-UMP was ranked at the most likely structure in MS-FINDER 2.0 with computational assigned substructures. The mass spectra and retention times in GC-MS (left) and LC-MS/MS (right) were matched between BinBase ID 106699 in cancer cell sample with chemically synthesized N-methyl-UMP standard for final validation.

See this image and copyright information in PMC

References

1. Kim S, et al. Nucleic Acids Res. 2015;44:1202–1213.
1. Silva RR, Dorrestein PC, Quinn RA. Proc Natl Acad Sci. 2015;112:12549–12550. - PMC - PubMed
1. Hanson AD, Pribat A, Waller JC, de Crécy-Lagard V. Biochem J. 2010;425:1–11. - PMC - PubMed
1. Khersonsky O, Tawfik DS. Annu Rev Biochem. 2010;79:471–505. - PubMed
1. Linster CL, Van Schaftingen E, Hanson AD. Nat Chem Biol. 2013;9:72–80. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics

Affiliations

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous