Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 16;10(10):e1003882.
doi: 10.1371/journal.pcbi.1003882. eCollection 2014 Oct.

Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models

Affiliations

Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models

Matthew N Benedict et al. PLoS Comput Biol. .

Abstract

Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Gap filling workflows.
We have developed four gap filling workflows and used them to generate the results in this paper: targeted parsimony-based gap filling, targeted likelihood-based gap filling, iterative parsimony-based gap filling, and iterative likelihood-based gap filling. The individual steps are described in detail in the methods, and the technical details of running them using the web interface are described in the supplementary material. Green boxes represent inputs to the workflows. “Limit” is the user-defined time limit and t max is a system-defined maximum possible time limit for gap filling (currently one day) to prevent overloading the compute servers.
Figure 2
Figure 2. ROC curve for annotations.
We computed the likelihood of all possible gene-reaction pairings from the ModelSEED database and compared the likelihoods of those pairings present in the iJR904 E. coli and iBSU1103 B. subtilis models (‘true positives’) to those which were not (‘false positives’). Each point in the curve represents the percentage of true and false positive linkages remaining at different likelihood cutoffs (labeled on each point). We found that there was a significant enrichment of true positives at high likelihood levels and false positives at low likelihood levels.
Figure 3
Figure 3. Proof of principle: Gap filling highly-likely reactions in B. subtilis.
B. subtilis synthesizes lipids via the non-mevalonate pathway (blue) . We removed this pathway from the B. subtilis genome-scale model and then tried to fill the gap using both the likelihood and parsimony-based approaches. The parsimony-based gap filling approach instead filled the gap with the mevalonate pathway (red), which is shorter but not supported by genetic evidence. The likelihood-based approach filled the gap with the correct pathway. Black indicates reactions that were not knocked out (there was no explicit link to literature evidence in the B. subtilis model). The numeric labels are the computed likelihoods of gap filling reactions.
Figure 4
Figure 4. Genes added to the model using likelihood-based and parsimony-based gap filling.
Likelihood-based gap filling produced more new gene annotations than post-processing gap filled reactions generated using the parsimony-based approach. The plot shows the number of uniquely-added genes by likelihood-based and parsimony-based gap filling approaches (genes in common with both approaches are omitted for clarity but tended to be more than those unique to either approach). A) Number of genes added after targeted gap filling to activate biomass production. B) Number of genes added after iterative gap filling.
Figure 5
Figure 5. Likelihoods of gene-reaction associations added using likelihood-based and parsimony-based gap filling.
The average likelihood of links between genes and reactions that were added using likelihood-based gap filling tended to be greater than the average likelihood of links resulting from post-processing the parsimony-based gap filling result. Note that it was not greater for all models (e.g., Pseudomonas aeruginosa) because the likelihood-based gap filling approach maximizes likelihood of reactions, not annotations, and as a result picks fewer reactions with 0 likelihood (no predicted gene associations). A) Targeted gap filling result. B) Iterative gap filling result.
Figure 6
Figure 6. Knockout lethality accuracy for genes added in gap filling.
Gene knockout simulations were performed for models gap filled with each of the four workflows to assess the consistency between lethality prediction and knockout lethality data for genes added in gap filling. Likelihood-based gap filling was able to produce the most candidate gene associations, with high specificity and low sensitivity in lethality predictions. The difference in accuracy between likelihood-based and parsimony-based gap filling was not statistically significant. A) Number of positive growth predictions, B) Number of negative growth predictions.

References

    1. Edwards J, Palsson B (2000) The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proceedings of the National Academy of Sciences 97: 5528–5533. - PMC - PubMed
    1. Reed JL, Vo TD, Schilling CH, Palsson BO, et al. (2003) An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol 4: R54. - PMC - PubMed
    1. Benedict MN, Gonnerman MC, Metcalf WW, Price ND (2012) Genome-Scale Metabolic Reconstruction and Hypothesis Testing in the Methanogenic Archaeon Methanosarcina acetivorans C2A. J Bacteriol 194: 855–865. - PMC - PubMed
    1. Duarte NC, Herrgäard MJ, Palsson BØ (2004) Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome research 14: 1298–1309. - PMC - PubMed
    1. Milne CB, Kim PJ, Eddy JA, Price ND (2009) Accomplishments in genome-scale in silico modeling for industrial and medical biotechnology. Biotechnol J 4: 1653–1670. - PMC - PubMed

Publication types