Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 23;14(5):e1006146.
doi: 10.1371/journal.pcbi.1006146. eCollection 2018 May.

Traceability, reproducibility and wiki-exploration for "à-la-carte" reconstructions of genome-scale metabolic models

Affiliations

Traceability, reproducibility and wiki-exploration for "à-la-carte" reconstructions of genome-scale metabolic models

Méziane Aite et al. PLoS Comput Biol. .

Abstract

Genome-scale metabolic models have become the tool of choice for the global analysis of microorganism metabolism, and their reconstruction has attained high standards of quality and reliability. Improvements in this area have been accompanied by the development of some major platforms and databases, and an explosion of individual bioinformatics methods. Consequently, many recent models result from "à la carte" pipelines, combining the use of platforms, individual tools and biological expertise to enhance the quality of the reconstruction. Although very useful, introducing heterogeneous tools, that hardly interact with each other, causes loss of traceability and reproducibility in the reconstruction process. This represents a real obstacle, especially when considering less studied species whose metabolic reconstruction can greatly benefit from the comparison to good quality models of related organisms. This work proposes an adaptable workspace, AuReMe, for sustainable reconstructions or improvements of genome-scale metabolic models involving personalized pipelines. At each step, relevant information related to the modifications brought to the model by a method is stored. This ensures that the process is reproducible and documented regardless of the combination of tools used. Additionally, the workspace establishes a way to browse metabolic models and their metadata through the automatic generation of ad-hoc local wikis dedicated to monitoring and facilitating the process of reconstruction. AuReMe supports exploration and semantic query based on RDF databases. We illustrate how this workspace allowed handling, in an integrated way, the metabolic reconstructions of non-model organisms such as an extremophile bacterium or eukaryote algae. Among relevant applications, the latter reconstruction led to putative evolutionary insights of a metabolic pathway.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. AuReMe workspace.
Overview of the AuReMe workspace. Admissible inputs include standard formats in genomics and metabolic model fields that can be outputs of major reconstruction platforms. AuReMe acts as a workflow controller to administer the reconstruction or modification of the GSM performed by heterogeneous and independent tools. The latter are part of the services of AuReMe (reconstruction tools, analyses, manual curation) and can be chained together, either in a pre-set pipeline or in a customized one. In any case the PADMet data manager stores adequate information regarding the model and its metadata, most importantly the process ones, that keeps track of the modifications performed (at what step a reaction was added, by which tool etc.). At any time, the reconstruction can be monitored locally via an automatically-generated wiki that informs the user about the state of the model. Outputs of AuReMe can be self-sufficient or be integrated again in many existing platforms.
Fig 2
Fig 2. Screen captures of several pages of the local wiki and the interactions between them.
A local wiki-based export of the GSM facilitates user-interface exploration and traceability of the reconstruction procedure. Several screenshots of a wiki are displayed, arrows represent the link between pages. Notably, reactions can be sorted and explored according to reconstruction categories, tools and sources. The navigation panel enables exploring and comparing the contributions of each tool used in the “à la carte” GSM reconstruction pipeline. Pathways can be sorted based on their completion rate.
Fig 3
Fig 3. Examples of customizable GSM reconstruction pipelines.
PADMet allows a user to easily implement flexible and personalized pipelines adapted to the wideness of the considered species and resource data. PADMet traces multiple complex reconstruction paradigms. 4 customizations of the reconstruction are presented here: the orthology- and gap-filling-based reconstruction of a) Enterococcus faecalis and b) Sulfobacillus thermosulfidooxidans str. Cutipay models, and the reconstructions of c) Ectocarpus siliculosus and d) Tisochrysis lutea models, using orthology, annotation and gap-filling. All models include manual curations and several analysis steps.
Fig 4
Fig 4. Interest of heterogeneous methods in pathway completion and filling thanks to tracking of process metadata.
Completion of the 6-hydroxymethyl-dihydropterin diphosphate biosynthesis I and the tetrahydrofolate biosynthesis pathways in E. siliculosus via the combination of annotation (yellow), orthology (green) and gap-filling (blue). The dihydrofolate compound with the dotted line is an instance of the dihydrofolate-glu-n class, following MetaCyc classes ontology structure. The class compound is the original reactant of the dihydrofolatereduct-rxn reaction retrieved with annotation, whereas the previous reaction of the pathway (dihydrofolatesynth-rxn) produces the instance dihydrofolate. Hence the gap-filling step that, using an extended version of MetaCyc, selects an instantiated version of dihydrofolatesynth-rxn that consumes the instance dihydrofolate.
Fig 5
Fig 5. Tisochrysis lutea metabolic model exploration: Origin of reactions according to the reconstruction pipeline.
(A) Comparison of the numbers of EC numbers introduced in the network either by the annotation pipeline or by the orthology-based analysis 898 enzymes were identified via annotation-based information and 790 enzymes through orthology-based data, among which 524 were already identified via annotation information. (B) Number of T-Iso ortholog enzymes according to their origin in template models. For each of the 790 T-Iso ortholog enzymes, the figure depicts in which of the four template models an ortholog of the enzyme had been identified. The four templates used were: A. thaliana, C. reinhardtii, E. siliculosus and Synechocystis sp. PCC 6803 to decipher ortholog enzymes in T. lutea. (C) T-Iso carnosine biosynthesis. Reconstruction of T-Iso carnosine synthesis pathway was performed using three sources of data (i) T-Iso genome annotations (cyan star); (ii) template metabolic models (stars) of four organisms: A. thaliana (blue), C. reinhardtii (green), E. siliculosus (red), and Synechocystis sp. (yellow) with orthology-based information; (iii) complete proteomes of the four organisms (squares) with sequence alignment information (best reciprocal hit in blasts). All reactions of the T-Iso carnosine biosynthesis are common to the four organisms except for three of them: ASPDECARBOX-RXN, HISTIDPHOS-RXN, and CARNOSINE-SYNTHASE-RXN. The first seems to belong to an alternative pathway to produce β-alanine, also found in C. reinhardtii, Synechocystis sp and Candidatus Phaeomarinobacter ectocarpi, a symbiotic bacterium to E. siliculosus. HISTIDPHOS-RXN was not found in E. siliculosus but was identified in its symbiotic bacterium Candidatus Phaeomarinobacter ectocarpi. CARNOSINE-SYNTHASE-RXN was only identified in algae (C. reinhardtii, E. siliculosus and T. lutea).

References

    1. Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nat Rev Genet. 2014;15: 107–20. doi: 10.1038/nrg3643 - DOI - PubMed
    1. Orth J, Thiele I, Bernhard. What is flux balance analysis? Nat Biotechnol. 2010;28: 245–248. doi: 10.1038/nbt.1614 - DOI - PMC - PubMed
    1. Yim H, Haselbeck R, Niu W, Baxley CP, Burgard A, Boldt J, et al. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat Chem Biol. 2011;7: 445–452. doi: 10.1038/nchembio.580 - DOI - PubMed
    1. Kim HU, Kim SY, Jeong H, Kim TY, Kim JJ, Choy HE, et al. Integrative genome-scale metabolic analysis of Vibrio vulnificus for drug targeting and discovery. Mol Syst Biol. John Wiley & Sons, Ltd; 2011;7: 460 doi: 10.1038/msb.2010.115 - DOI - PMC - PubMed
    1. Zelezniak A, Andrejev S, Ponomarova O, Mende DR, Bork P, Patil KR. Metabolic dependencies drive species co-occurrence in diverse microbial communities. Proc Natl Acad Sci U S A. 2015;112: 6449–6454. doi: 10.1073/pnas.1421834112 - DOI - PMC - PubMed

Publication types

MeSH terms