How large is the metabolome? A critical analysis of data exchange practices in chemistry

Tobias Kind¹, Martin Scholz, Oliver Fiehn

Affiliations

PMID: 19415114
PMCID: PMC2673031
DOI: 10.1371/journal.pone.0005440

How large is the metabolome? A critical analysis of data exchange practices in chemistry

Tobias Kind et al. PLoS One. 2009.

. 2009;4(5):e5440.

doi: 10.1371/journal.pone.0005440. Epub 2009 May 5.

Authors

Tobias Kind¹, Martin Scholz, Oliver Fiehn

Affiliation

¹ University of California Davis, Genome Center - Metabolomics, Davis, CA, USA.

PMID: 19415114
PMCID: PMC2673031
DOI: 10.1371/journal.pone.0005440

Abstract

Background: Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites.

Results: As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases.

Conclusions: We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Metabolites found in *oryza sativa* plant organs.**
The flavor of basmati rice comes from 2-acetyl-1-pyrrolidine, beta-carotene enriched golden rice was created to defeat vitamin-A deficiencies in the third world and the compound Bisbynin is not created by rice itself but a parasitic soil fungus *Stachybotrys bisbyi* found in rice seeds.

Figure 2. The process of building pathway and metabolite databases includes a) data extraction from the literature b) use of in-silico and ortholog mapping approaches and c) direct input from experimental databases like SetupX.
Molecular pathway databases can be built for all known taxonomic species which can be found in the NCBI taxonomy database.

**Figure 3. Data loss occurs during conversion of digital data in the lab to analog data in a publication.**
Later analog information from a publication including structures and molecular spectra are converted back to digital information (hamburger-to-cow algorithm). Such name to structure and OCR-optical chemical structure recognition algorithms are error prone and information loss is even higher for complex molecular spectra. Direct electronic submission of chemical structures and spectral data is recommended.

See this image and copyright information in PMC

References

1. International Rice Research Institute (IRRI); [ http://www.irri.org/]
1. Kind T, Fiehn O. Hardware and Software Challenges for the Near Future: Structure Elucidation Concepts via Hyphenated Chromatographic Techniques. Lc Gc N Am. 2008;26:176.
1. Kind T, Fiehn O. Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics. 2006;7:234. - PMC - PubMed
1. Kind T, Fiehn O. Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics. 2007;8:105. - PMC - PubMed
1. Hill DW, Kertesz TM, Fontaine D, Friedman R, Grant DF. Mass Spectral Metabonomics beyond Elemental Formula: Chemical Database Querying by Matching Experimental with Computational Fragmentation Spectra. Analytical Chemistry. 2008;80:5574–5582. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 ES13932/ES/NIEHS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

How large is the metabolome? A critical analysis of data exchange practices in chemistry

Affiliation

How large is the metabolome? A critical analysis of data exchange practices in chemistry

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous