Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 15;12 Suppl 1(Suppl 1):S6.
doi: 10.1186/1471-2164-12-S1-S6.

Improved genome annotation through untargeted detection of pathway-specific metabolites

Affiliations

Improved genome annotation through untargeted detection of pathway-specific metabolites

Benjamin P Bowen et al. BMC Genomics. .

Abstract

Background: Mass spectrometry-based metabolomics analyses have the potential to complement sequence-based methods of genome annotation, but only if raw mass spectral data can be linked to specific metabolic pathways. In untargeted metabolomics, the measured mass of a detected compound is used to define the location of the compound in chemical space, but uncertainties in mass measurements lead to "degeneracies" in chemical space since multiple chemical formulae correspond to the same measured mass. We compare two methods to eliminate these degeneracies. One method relies on natural isotopic abundances, and the other relies on the use of stable-isotope labeling (SIL) to directly determine C and N atom counts. Both depend on combinatorial explorations of the "chemical space" comprised of all possible chemical formulae comprised of biologically relevant chemical elements.

Results: Of 1532 metabolic pathways curated in the MetaCyc database, 412 contain a metabolite having a chemical formula unique to that metabolic pathway. Thus, chemical formulae alone can suffice to infer the presence of some metabolic pathways. Of 248,928 unique chemical formulae selected from the PubChem database, more than 95% had at least one degeneracy on the basis of accurate mass information alone. Consideration of natural isotopic abundance reduced degeneracy to 64%, but mainly for formulae less than 500 Da in molecular weight, and only if the error in the relative isotopic peak intensity was less than 10%. Knowledge of exact C and N atom counts as determined by SIL enabled reduced degeneracy, allowing for determination of unique chemical formula for 55% of the PubChem formulae.

Conclusions: To facilitate the assignment of chemical formulae to unknown mass-spectral features, profiling can be performed on cultures uniformly labeled with stable isotopes of nitrogen (15N) or carbon (13C). This makes it possible to accurately count the number of carbon and nitrogen atoms in each molecule, providing a robust means for reducing the degeneracy of chemical space and thus obtaining unique chemical formulae for features measured in untargeted metabolomics having a mass greater than 500 Da, with relative errors in measured isotopic peak intensity greater than 10%, and without the use of a chemical formula generator dependent on heuristic filtering. These chemical formulae can serve as indicators for the presence of particular metabolic pathways.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Numbers of metabolites (red bars) and chemical formulae (black bars) in the MetaCyc database present only in a specific number of reactions (A) or pathways (B). Numbers of metabolites (red bars) and chemical formulae (black bars) in the KEGG database present only in a specific number of reactions (C). Metabolites present in reactions not linked to any pathway in MetaCyc were not taken into account for panel B. A large number of metabolites and chemical formulae are unique, thus are associated with a single reaction or pathway.
Figure 2
Figure 2
Schematic illustrating conceptual and experimental approaches to representing and searching through chemical space. As shown in (A), the monoisotopic mass of folate (M0) is projected into the range of possible chemical species in chemical space. Many distinct points in chemical space are nearly indistinguishable in this projection (“degeneracy”). In (B), the points are projected into 3 dimensional space where the number of nitrogen and carbon atoms in each chemical formula is known, and in (C), the isotopic intensities of M1 and M2 peaks relative to the M0 peak form other axes along which formulae can be projected.
Figure 3
Figure 3
Comparison of formula generation algorithms. The HR2 algorithm (A) or the Brute Force algorithm (B) was used to estimate the mass degeneracy (to within 5 ppm mass accuracy) of representative points in chemical space, i.e., the number of unique chemical formulae within 5 ppm of a target mass. Brute force consistently found a higher mass degeneracy in chemical space (i.e., more possible formulae) than HR2. Additionally, for approximately 5% of the representative points chosen, HR2 (C) was unable to find any point in chemical space (i.e. recapitulate the formula corresponding exactly to the seed mass).
Figure 4
Figure 4
Stable isotope labeling restricts the number of possible chemical formulae for measured mass values in metabolomics datasets. All panels show a distribution of chemically representative unique masses in chemical space. In panel (A) HR2 was used to calculate the proportion of these unique masses having either no mass degeneracy (black) within (at 5 ppm), masses having a single mass degeneracy (blue), or two or more mass degeneracies (red). Panels (B) and (C) show how specifying points in chemical space by not only the exact mass, but also C atom counts and N atom counts as determined by stable isotopic labeling, affects the degeneracy. Panel (B) uses HR2 to estimate the degeneracy, while in (C) Brute Force is used.
Figure 5
Figure 5
Comparison of stable isotope labeling to relative isotopic peak intensity as a means of aiding unique formula determination. Panels (A) & (C) show the fraction of tests where the specification of the chemical formulae using the C and N count was improved in comparison to using only relative isotopic peak intensity. Panels (B) & (D) show the mass distribution of tests where each method performed better or worse. Brute force was used in panels (A) and (B). HR2 was used in panels (C) and (D). These plots show that for larger compounds the C and N atom count provides information not obtainable from relative isotopic peak intensity.

Similar articles

Cited by

References

    1. Yizhak K, Benyamini T, Liebermeister W, Ruppin E, Shlomi T. Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model. Bioinformatics. 2010;26(12):i255–260. doi: 10.1093/bioinformatics/btq183. - DOI - PMC - PubMed
    1. Joyce AR, Palsson BO. The model organism as a system: integrating 'omics' data sets. Nat Rev Mol Cell Biol. 2006;7(3):198–210. doi: 10.1038/nrm1857. - DOI - PubMed
    1. May P, Wienkoop S, Kempa S, Usadel B, Christian N, Rupprecht J, Weiss J, Recuenco-Munoz L, Ebenhoh O, Weckwerth W. et al.Metabolomics- and proteomics-assisted genome annotation and analysis of the draft metabolic network of Chlamydomonas reinhardtii. Genetics. 2008;179(1):157–166. doi: 10.1534/genetics.108.088336. - DOI - PMC - PubMed
    1. Baran R, Reindl W, Northen T. Mass spectrometry based metabolomics and enzymatic assays for functional genomics. Current Opinion in Microbiology. 2009;12(5):547–552. doi: 10.1016/j.mib.2009.07.004. - DOI - PubMed
    1. Saghatelian A, Trauger S, Want E, Hawkins E, Siuzdak G, Cravatt B. Assignment of endogenous substrates to enzymes by global metabolite profiling. Biochemistry. 2004;43(45):14332–14339. doi: 10.1021/bi0480335. - DOI - PubMed

Publication types

MeSH terms