Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 27;6(1):2.
doi: 10.1186/1758-2946-6-2.

Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2

Affiliations

Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2

Hulda S Haraldsdóttir et al. J Cheminform. .

Abstract

Background: An important step in the reconstruction of a metabolic network is annotation of metabolites. Metabolites are generally annotated with various database or structure based identifiers. Metabolite annotations in metabolic reconstructions may be incorrect or incomplete and thus need to be updated prior to their use. Genome-scale metabolic reconstructions generally include hundreds of metabolites. Manually updating annotations is therefore highly laborious. This prompted us to look for open-source software applications that could facilitate automatic updating of annotations by mapping between available metabolite identifiers. We identified three applications developed for the metabolomics and chemical informatics communities as potential solutions. The applications were MetMask, the Chemical Translation System, and UniChem. The first implements a "metabolite masking" strategy for mapping between identifiers whereas the latter two implement different versions of an InChI based strategy. Here we evaluated the suitability of these applications for the task of mapping between metabolite identifiers in genome-scale metabolic reconstructions. We applied the best suited application to updating identifiers in Recon 2, the latest reconstruction of human metabolism.

Results: All three applications enabled partially automatic updating of metabolite identifiers, but significant manual effort was still required to fully update identifiers. We were able to reduce this manual effort by searching for new identifiers using multiple types of information about metabolites. When multiple types of information were combined, the Chemical Translation System enabled us to update over 3,500 metabolite identifiers in Recon 2. All but approximately 200 identifiers were updated automatically.

Conclusions: We found that an InChI based application such as the Chemical Translation System was better suited to the task of mapping between metabolite identifiers in genome-scale metabolic reconstructions. We identified several features, however, that could be added to such an application in order to tailor it to this task.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Lactose stereoisomers. Two epimers of lactose occur in nature, α-lactose and β-lactose. The epimers differ by the configuration of structural groups around a single stereogenic carbon atom (top right). (a) In KEGG Compound the synonyms lactose and milk sugar are assigned to a generic stereoisomer, where the configuration around this stereogenic carbon is not specified (C00243). Reactions, enzymes and pathways involving lactose are linked to this entry in KEGG. (b) The same synonyms and most lactose-related data are linked to the α-epimer in HMDB (HMDB00186). There is neither an entry for the generic stereoisomer in HMDB, nor an entry for the α-epimer in KEGG Compound.
Figure 2
Figure 2
Identifiers output in identifier mapping tests. Annotations of unique identifiers returned by each application, (a) when all mapping tests are included, and (b) when only tests involving identifier types covered by UniChem are included. The output identifiers returned in all included tests were pooled and duplicates removed. If the same identifier was returned in more than one test it was only counted once. The annotations are explained in Section Scoring.
Figure 3
Figure 3
Recon 2 identifiers. Identifier statistics for Recon 2 before and after metabolite annotations were updated with CTS. (a) Number of unique metabolites with each of the seven types of identifiers. n: names, i: InChIKeys, c: ChEBI ID, h: HMDB ID, k: KEGG CID, p: PubChem CID, l: LipidMAPS ID. (b) Number of unique metabolites with one, and up to seven, identifiers each.
Figure 4
Figure 4
Annotation of output identifiers. An example demonstrating annotation of output PubChem Compound identifiers (b-e), when the KEGG Compound identifier for D-glucose (a) is input to a mapping application. The preferred output identifier is for D-glucose (b), but an identifier for alpha-D-glucose (c) is also valid since it is a D-glucose. An identifier for a generic hexose (d), however, is not valid. Finally, an identifier for phospholactic acid (e), which is a completely different compound, is incorrect.

Similar articles

Cited by

References

    1. Palsson BØ. Systems Biology: Properties of Reconstructed Networks, 1st edn. Cambridge: Cambridge University Press; 2006.
    1. Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010;5(1):93–121. - PMC - PubMed
    1. Kümmel A, Panke S, Heinemann M. Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data. Mol Syst Biol. 2006;2:2006–2034. - PMC - PubMed
    1. Rolfsson Ó, Paglia G, Magnúsdóttir M, Palsson BØ, Thiele I. Inferring the metabolism of human orphan metabolites from their metabolic network context affirms human gluconokinase activity. Biochem J. 2013;449(2):427–435. - PubMed
    1. Folger O, Jerby L, Frezza C, Gottlieb E, Ruppin E, Shlomi T. Predicting selective drug targets in cancer through metabolic networks. Mol Syst Biol. 2011;7:501. - PMC - PubMed