Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;8(10):e1003005.
doi: 10.1371/journal.pgen.1003005. Epub 2012 Oct 18.

Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information

Affiliations

Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information

Jan Krumsiek et al. PLoS Genet. 2012.

Abstract

Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

PubMed Disclaimer

Conflict of interest statement

AME, MWM, RPM, and MVM are employees of Metabolon. A patent application for the unknown identification method has been filed: “Identity Elucidation of Unknown Metabolites,” U.S. Patent Application No. 61503673, unpublished – filing date July 1, 2011 (MVM, applicant).

Figures

Figure 1
Figure 1. Data integration workflow for the systematic classification of unknown metabolites.
We combine high-throughput metabolomics and genotyping data in Gaussian graphical models (GGMs) and in genome-wide association studies (GWAS) in order to produce testable predictions of the unknown metabolites' identities. These hypotheses are then subject to experimental verification by mass-spectrometry. Six such cases have been fully worked through and are presented in Table 3.
Figure 2
Figure 2. Manhattan plot of genetic association.
The strength of association for known (bottom) and unknown (top) metabolites is indicated as the negative logarithm of the p-value for the linear model (see Methods). Only metabolite-SNP associations with p-values below 10−6 are plotted (grey circles). Triangles represent metabolite-SNP associations with p-values below 10−40. Horizontal lines indicate the threshold for genome-wide significance (formula image = 1.6×10−10 corresponding to α = 0.05 after Bonferroni correction); red vertical dashes indicate loci at which this threshold is attained.
Figure 3
Figure 3. Gaussian graphical modeling.
GGMs embed unknown metabolites into their biochemical context. A: Complete network presentation of partial correlations that are significantly different from zero at α = 0.05 after Bonferroni correction. The unknown metabolites are spread over the entire network and are involved in various metabolic pathways. B–D: Selected high-scoring sub-networks. We observe that GGM edges directly correspond to chemical reactions which alter specific chemical groups (e.g. carbonyl groups and methyl groups). Solid lines denote positive partial correlation. Dashed lines indicate negative partial correlations. Line widths represent partial correlation strengths.
Figure 4
Figure 4. Semi-automatic prediction of unknown metabolite identities.
A: Examples of how to determine pathway classifications based on the functional annotations of GGM and GWAS hits. We present two metabolites, X-11421 and X-11244, whose GGM and GWAS associations clearly point into carnitine and steroid metabolism, respectively. B: Overview of unknowns functionally annotated by both GGMs and the GWAS approach. ‘GGM’ refers to an unknown metabolite which is three or less steps away from a known metabolite in the GGM, whereas ‘direct GGM’ represents direct neighbors in the network. C: Pathway predictions for the 16 unknowns with both direct GGM and GWAs annotations. Unknowns marked with a star were subjected to in-depth analysis followed by experimental validation in the following.
Figure 5
Figure 5. Detailed investigation of three scenarios (DIPEPTIDE, STEROID, and HETE).
In order to generate concrete hypotheses on the unknowns' identities, we assembled all available information for each scenario. This includes biochemical edges from the GGM, genetic associations from the GWAS, pathway annotations as well as mass information. For details of the predicted identities, see Table 3 and main text. Similar figures for three further scenarios (CARNITINE, BILIRUBIN, and ASCORBATE) are available in Text S3.
Figure 6
Figure 6. Experimental confirmation of X-14208 as phenylalanylserine.
Two possible dipeptide variants were predicted and consequently tested. The fragmentation spectrum of the 253.1 m/z ion (positive mode) of the pure Phe-Ser matches that of the unknown compound, whereas the spectrum for pure Ser-Phe differs visibly. Moreover, the retention index (RI) of Phe-Ser is similar to the RI of X-14208, whereas that of Ser-Phe is significantly different.

Similar articles

Cited by

References

    1. Gieger C, Geistlinger L, Altmaier E, de MH, Kronenberg F, et al. (2008) Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet 4: e1000282 doi:10.1371/journal.pgen.1000282 - DOI - PMC - PubMed
    1. Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R, et al. (2010) A genome-wide perspective of genetic variation in human metabolism. Nat Genet 42: 137–141. - PMC - PubMed
    1. Suhre K, Wallaschofski H, Raffler J, Friedrich N, Haring R, et al. (2011) A genome-wide association study of metabolic traits in human urine. Nat Genet 43: 565–569. - PubMed
    1. Nicholson G, Rantalainen M, Li JV, Maher AD, Malmodin D, et al. (2011) A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS Genet 7: e1002270 doi:10.1371/journal.pgen.1002270 - DOI - PMC - PubMed
    1. Suhre K, Shin S-Y, Petersen A-K, Mohney RP, Meredith D, et al. (2011) Human metabolic individuality in biomedical and pharmaceutical research. Nature 477: 54–60. - PMC - PubMed

Publication types