. 2012;8(10):e1003005.

doi: 10.1371/journal.pgen.1003005. Epub 2012 Oct 18.

Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information

Jan Krumsiek¹, Karsten Suhre, Anne M Evans, Matthew W Mitchell, Robert P Mohney, Michael V Milburn, Brigitte Wägele, Werner Römisch-Margl, Thomas Illig, Jerzy Adamski, Christian Gieger, Fabian J Theis, Gabi Kastenmüller

Affiliations

PMID: 23093944
PMCID: PMC3475673
DOI: 10.1371/journal.pgen.1003005

Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information

Jan Krumsiek et al. PLoS Genet. 2012.

. 2012;8(10):e1003005.

doi: 10.1371/journal.pgen.1003005. Epub 2012 Oct 18.

Authors

Affiliation

¹ Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.

PMID: 23093944
PMCID: PMC3475673
DOI: 10.1371/journal.pgen.1003005

Abstract

Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

PubMed Disclaimer

Conflict of interest statement

AME, MWM, RPM, and MVM are employees of Metabolon. A patent application for the unknown identification method has been filed: “Identity Elucidation of Unknown Metabolites,” U.S. Patent Application No. 61503673, unpublished – filing date July 1, 2011 (MVM, applicant).

Figures

**Figure 1. Data integration workflow for the systematic classification of unknown metabolites.**
We combine high-throughput metabolomics and genotyping data in Gaussian graphical models (GGMs) and in genome-wide association studies (GWAS) in order to produce testable predictions of the unknown metabolites' identities. These hypotheses are then subject to experimental verification by mass-spectrometry. Six such cases have been fully worked through and are presented in Table 3.

**Figure 2. Manhattan plot of genetic association.**
The strength of association for known (bottom) and unknown (top) metabolites is indicated as the negative logarithm of the p-value for the linear model (see Methods). Only metabolite-SNP associations with p-values below 10⁻⁶ are plotted (grey circles). Triangles represent metabolite-SNP associations with p-values below 10⁻⁴⁰. Horizontal lines indicate the threshold for genome-wide significance ( = 1.6×10⁻¹⁰ corresponding to α = 0.05 after Bonferroni correction); red vertical dashes indicate loci at which this threshold is attained.

formula image — **Figure 2. Manhattan plot of genetic association.**
The strength of association for known (bottom) and unknown (top) metabolites is indicated as the negative logarithm of the p-value for the linear model (see Methods). Only metabolite-SNP associations with p-values below 10⁻⁶ are plotted (grey circles). Triangles represent metabolite-SNP associations with p-values below 10⁻⁴⁰. Horizontal lines indicate the threshold for genome-wide significance ( = 1.6×10⁻¹⁰ corresponding to α = 0.05 after Bonferroni correction); red vertical dashes indicate loci at which this threshold is attained.

**Figure 3. Gaussian graphical modeling.**
GGMs embed unknown metabolites into their biochemical context. A: Complete network presentation of partial correlations that are significantly different from zero at α = 0.05 after Bonferroni correction. The unknown metabolites are spread over the entire network and are involved in various metabolic pathways. B–D: Selected high-scoring sub-networks. We observe that GGM edges directly correspond to chemical reactions which alter specific chemical groups (e.g. carbonyl groups and methyl groups). Solid lines denote positive partial correlation. Dashed lines indicate negative partial correlations. Line widths represent partial correlation strengths.

**Figure 4. Semi-automatic prediction of unknown metabolite identities.**
A: Examples of how to determine pathway classifications based on the functional annotations of GGM and GWAS hits. We present two metabolites, X-11421 and X-11244, whose GGM and GWAS associations clearly point into carnitine and steroid metabolism, respectively. B: Overview of unknowns functionally annotated by both GGMs and the GWAS approach. ‘GGM’ refers to an unknown metabolite which is three or less steps away from a known metabolite in the GGM, whereas ‘direct GGM’ represents direct neighbors in the network. C: Pathway predictions for the 16 unknowns with both direct GGM and GWAs annotations. Unknowns marked with a star were subjected to in-depth analysis followed by experimental validation in the following.

**Figure 5. Detailed investigation of three scenarios (DIPEPTIDE, STEROID, and HETE).**
In order to generate concrete hypotheses on the unknowns' identities, we assembled all available information for each scenario. This includes biochemical edges from the GGM, genetic associations from the GWAS, pathway annotations as well as mass information. For details of the predicted identities, see Table 3 and main text. Similar figures for three further scenarios (CARNITINE, BILIRUBIN, and ASCORBATE) are available in Text S3.

**Figure 6. Experimental confirmation of X-14208 as phenylalanylserine.**
Two possible dipeptide variants were predicted and consequently tested. The fragmentation spectrum of the 253.1 m/z ion (positive mode) of the pure Phe-Ser matches that of the unknown compound, whereas the spectrum for pure Ser-Phe differs visibly. Moreover, the retention index (RI) of Phe-Ser is similar to the RI of X-14208, whereas that of Ser-Phe is significantly different.

See this image and copyright information in PMC

Cited by

Deployment-Associated Exposure Surveillance With High-Resolution Metabolomics.
Walker DI, Mallon CT, Hopke PK, Uppal K, Go YM, Rohrbeck P, Pennell KD, Jones DP. Walker DI, et al. J Occup Environ Med. 2016 Aug;58(8 Suppl 1):S12-21. doi: 10.1097/JOM.0000000000000768. J Occup Environ Med. 2016. PMID: 27501099 Free PMC article.
Assessing the Causal Effects of Human Serum Metabolites on 5 Major Psychiatric Disorders.
Yang J, Yan B, Zhao B, Fan Y, He X, Yang L, Ma Q, Zheng J, Wang W, Bai L, Zhu F, Ma X. Yang J, et al. Schizophr Bull. 2020 Jul 8;46(4):804-813. doi: 10.1093/schbul/sbz138. Schizophr Bull. 2020. PMID: 31919502 Free PMC article.
Urinary proteomics and metabolomics studies to monitor bladder health and urological diseases.
Chen Z, Kim J. Chen Z, et al. BMC Urol. 2016 Mar 22;16:11. doi: 10.1186/s12894-016-0129-7. BMC Urol. 2016. PMID: 27000794 Free PMC article. Review.
Early Diagnosis of Sepsis: Is an Integrated Omics Approach the Way Forward?
Langley RJ, Wong HR. Langley RJ, et al. Mol Diagn Ther. 2017 Oct;21(5):525-537. doi: 10.1007/s40291-017-0282-z. Mol Diagn Ther. 2017. PMID: 28624903 Free PMC article. Review.
Whole Genome Association Study of the Plasma Metabolome Identifies Metabolites Linked to Cardiometabolic Disease in Black Individuals.
Tahir UA, Katz DH, Avila-Pachecho J, Bick AG, Pampana A, Robbins JM, Yu Z, Chen ZZ, Benson MD, Cruz DE, Ngo D, Deng S, Shi X, Zheng S, Eisman AS, Farrell L, Hall ME, Correa A, Tracy RP, Durda P, Taylor KD, Liu Y, Johnson WC, Guo X, Yao J, Chen YI, Manichaikul AW, Ruberg FL, Blaner WS, Jain D; NHLBI Trans-Omics for Precision Medicine 1 Consortium; Bouchard C, Sarzynski MA, Rich SS, Rotter JI, Wang TJ, Wilson JG, Clish CB, Natarajan P, Gerszten RE. Tahir UA, et al. Nat Commun. 2022 Aug 22;13(1):4923. doi: 10.1038/s41467-022-32275-3. Nat Commun. 2022. PMID: 35995766 Free PMC article.

See all "Cited by" articles

References

1. Gieger C, Geistlinger L, Altmaier E, de MH, Kronenberg F, et al. (2008) Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet 4: e1000282 doi:10.1371/journal.pgen.1000282 - DOI - PMC - PubMed
1. Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R, et al. (2010) A genome-wide perspective of genetic variation in human metabolism. Nat Genet 42: 137–141. - PMC - PubMed
1. Suhre K, Wallaschofski H, Raffler J, Friedrich N, Haring R, et al. (2011) A genome-wide association study of metabolic traits in human urine. Nat Genet 43: 565–569. - PubMed
1. Nicholson G, Rantalainen M, Li JV, Maher AD, Malmodin D, et al. (2011) A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS Genet 7: e1002270 doi:10.1371/journal.pgen.1002270 - DOI - PMC - PubMed
1. Suhre K, Shin S-Y, Petersen A-K, Mohney RP, Meredith D, et al. (2011) Human metabolic individuality in biomedical and pharmaceutical research. Nature 477: 54–60. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information

Affiliation

Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous