Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan;15(1):53-56.
doi: 10.1038/nmeth.4512. Epub 2017 Nov 27.

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics

Affiliations

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics

Zijuan Lai et al. Nat Methods. 2018 Jan.

Abstract

Novel metabolites distinct from canonical pathways can be identified through the integration of three cheminformatics tools: BinVestigate, which queries the BinBase gas chromatography-mass spectrometry (GC-MS) metabolome database to match unknowns with biological metadata across over 110,000 samples; MS-DIAL 2.0, a software tool for chromatographic deconvolution of high-resolution GC-MS or liquid chromatography-mass spectrometry (LC-MS); and MS-FINDER 2.0, a structure-elucidation program that uses a combination of 14 metabolome databases in addition to an enzyme promiscuity library. We showcase our workflow by annotating N-methyl-uridine monophosphate (UMP), lysomonogalactosyl-monopalmitin, N-methylalanine, and two propofol derivatives.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare competing financial interests. Atsushi Ogiwara is a developer in Reifycs Inc., which provides the ABF converter of mass spectral data for free at http://www.reifycs.com/AbfConverter/.

Figures

Figure 1
Figure 1. Summary for functional and structural identification of unknown metabolites
(a) BinVestigate to search unknown compounds for metabolomics study metadata and (nominal) EI-MS spectra in BinBase, with results shown as sunburst diagrams to illustrate the biological origin (species, organs, cell types) of unknowns. (b) MS-DIAL 2.0 for universal GC-MS or LC-MS/MS deconvolution with high resolution (HR) mass spectrometry analytics to obtain the deconvoluted HR-MS spectra of unknowns needed for compound identification. (c) MS-FINDER 2.0 for universal GC-EI-MS and LC-ESI-MS/MS spectral interpretation to annotate unknowns in combination with the enzyme promiscuity structure database (MINE), resulted in the discovery of biologically significant chemical structure. The tools are fully connected in MS-DIAL. Each tool is also available as standalone program.
Figure 2
Figure 2. Metabolomic meta-analysis for origin exploration by BinVestigate
Bin IDs were queried in over 114,000 samples to show cross-study specificity and relevance of unknown BinBase ID 160842 (left) and unknown BinBase ID 106699 (right). In the sunburst diagrams, the area of the circular sector for each organ (inner cycle) or species (outer cycle) was mathematically determined by the average signal intensity of the unknown compound when present in such origin. Bin ID, Fiehn RI, Kovats RI, number of annotation records, and conclusion of biological significance for the five unknowns discussed in this paper were summarized in the table.
Figure 3
Figure 3. Identification of N-methyl-UMP by MS-DIAL 2.0 and MS-FINDER 2.0
High resolution GC-MS analytics was first used for structure elucidation (left), then LC-MS/MS was applied as additional evidence line to validate the discovery (right). (a) Spectral deconvolution: fragment ions and molecular adduct ions of BinBase ID 106699 were deconvoluted and confirmed through MS-DIAL 2.0. (b) Formula prediction and validation: C10H15N2O9P was scored and ranked at 1st in MS-FINDER 2.0 based on mass errors, isotope ratio errors, and subformula assignments. For GC-MS flow, chemical ionization data with different derivatization methods (MSTFA vs. MSTFAd9) were obtained to verify the formula as well as to yield the number of acidic protons; for LC-MS flow, between theoretical values and experimental values, the mass errors were only 1 mDa, and the isotopic ratio errors were within 1%. (c) Structure prediction, validation, and identification: structure candidates were retrieved from MINE DB in addition to internal metabolome database, and in silico fragmented based on hydrogen rearrangement rules, bond dissociation energy, and comprehensive fragmentation rule library (including GC-EI-MS and LC-ESI-MS/MS). N-methyl-UMP was ranked at the most likely structure in MS-FINDER 2.0 with computational assigned substructures. The mass spectra and retention times in GC-MS (left) and LC-MS/MS (right) were matched between BinBase ID 106699 in cancer cell sample with chemically synthesized N-methyl-UMP standard for final validation.

References

    1. Kim S, et al. Nucleic Acids Res. 2015;44:1202–1213.
    1. Silva RR, Dorrestein PC, Quinn RA. Proc Natl Acad Sci. 2015;112:12549–12550. - PMC - PubMed
    1. Hanson AD, Pribat A, Waller JC, de Crécy-Lagard V. Biochem J. 2010;425:1–11. - PMC - PubMed
    1. Khersonsky O, Tawfik DS. Annu Rev Biochem. 2010;79:471–505. - PubMed
    1. Linster CL, Van Schaftingen E, Hanson AD. Nat Chem Biol. 2013;9:72–80. - PubMed

Publication types