Comparative Study

. 2006 Mar 29:7:177.

doi: 10.1186/1471-2105-7-177.

Identifying metabolic enzymes with multiple types of association evidence

Peter Kharchenko¹, Lifeng Chen, Yoav Freund, Dennis Vitkup, George M Church

Affiliations

Affiliation

¹ Department of Genetics, New Research Building (NRB) Room 238, 77 Ave, Louis Pasteur, Harvard Medical School, Boston, MA 02115, USA. peter.kharchenko@post.harvard.edu

PMID: 16571130
PMCID: PMC1450304
DOI: 10.1186/1471-2105-7-177

Comparative Study

Identifying metabolic enzymes with multiple types of association evidence

Peter Kharchenko et al. BMC Bioinformatics. 2006.

. 2006 Mar 29:7:177.

doi: 10.1186/1471-2105-7-177.

Authors

Peter Kharchenko¹, Lifeng Chen, Yoav Freund, Dennis Vitkup, George M Church

Affiliation

¹ Department of Genetics, New Research Building (NRB) Room 238, 77 Ave, Louis Pasteur, Harvard Medical School, Boston, MA 02115, USA. peter.kharchenko@post.harvard.edu

PMID: 16571130
PMCID: PMC1450304
DOI: 10.1186/1471-2105-7-177

Abstract

Background: Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes.

Results: We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases.

Conclusion: We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities.

PubMed Disclaimer

Figures

**Figure 1**
a. Illustration of the missing gene problem. Metabolic network neighborhood of a missing metabolic enzyme is shown. The neighborhood comprises layers with increasing radii (3 layers shown, indicated by shading). Majority of the enzyme-encoding genes in the neighborhood are known. b. Illustration of the self-rank validation test. Ability to predict known enzyme-encoding genes is tested by measuring its *self-rank* - the rank of a true enzyme-encoding gene in the candidate set. The candidates are ordered according to overall strength of their functional association with the metabolic network neighborhood of the enzyme. The overall association strength is a combination of layer association scores that measure strength of functional association of the candidate gene with known enzyme-encoding genes in a single layer of the metabolic neighborhood (3 layers, as illustrated in a.). The candidate set contains all genes that are not already part of the metabolic network.

**Figure 2**
Performance of different phylogenetic profile datasets and corrections. The predictive performance of the algorithm is illustrated by showing the fraction of known enzyme-encoding genes (x axis) predicted within different self-rank thresholds (y axis). For instance, dashed performance curve in subfigure a. (BLAST:xHG) shows that 30% of the test enzymes appear within the top 10 (out of 3352) candidates for their enzymatic function. a. Algorithm performance in predicting known *E. coli* metabolic enzymes based on the phylogenetic profile associations with the 1^stlayer of the metabolic network neighborhood. Performance of a regular hypergeometric distribution is shown (HG), together with extended hypergeometric (xHG) and folding (xHG+folding) corrections. The scores are calculated on the BLAST-based dataset. b. The self-rank performance of the 1^stlayer phylogenetic profile score, calculated using extended hypergeometric distribution with folding is shown for BLAST-based, KEGG-based and COG orthology datasets. The performance of the COG orthology dataset is corrected for the metabolic gene coverage bias.

**Figure 3**
Comparison of ADT and DLR methods for combining multiple association evidence types. Fraction of enzymes predicted within different self-rank thresholds is shown for *E. coli* and *S. cerevisiae* metabolic enzymes. Predictions are based on the combined association evidence (see Methods, Table 1), using two different methods: DLR (dashed curves), and ADT (solid curves).

**Figure 4**
Enzyme predictions based on individual and combined types of association evidence (see Methods, Table 1). Fraction of known enzymes predicted within different self-rank thresholds is shown for a. *E. coli* metabolism and b. *S. cerevisiae* metabolism. Each curve indicates a probability (y axis) with which a true enzyme-encoding gene will be predicted within top n (x axis) candidates for its enzymatic function. The total number of candidates is 3352 for *E. coli* and 5253 for *S. cerevisiae*. Different curves demonstrate predictive performance of various types of association evidence. Predictions are generated based on functional association with the first three layers of the metabolic network neighborhood, using ADT classifier with 10-fold validation.

See this image and copyright information in PMC

References

1. Borodina I, Krabben P, Nielsen J. Genome-scale analysis of Streptomyces coelicolor A3(2) metabolism. Genome Res. 2005;15:820–829. doi: 10.1101/gr.3364705. - DOI - PMC - PubMed
1. Reed JL, Vo TD, Schilling CH, Palsson BO. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) Genome Biol. 2003;4:R54. doi: 10.1186/gb-2003-4-9-r54. - DOI - PMC - PubMed
1. Tatusov RL, Mushegian AR, Bork P, Brown NP, Hayes WS, Borodovsky M, Rudd KE, Koonin EV. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol. 1996;6:279–291. doi: 10.1016/S0960-9822(02)00478-5. - DOI - PubMed
1. Osterman A, Overbeek R. Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol. 2003;7:238–251. doi: 10.1016/S1367-5931(03)00027-9. - DOI - PubMed
1. Cordwell SJ. Microbial genomes and "missing" enzymes: redefining biochemical pathways. Arch Microbiol. 1999;172:269–279. doi: 10.1007/s002030050780. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying metabolic enzymes with multiple types of association evidence

Affiliation

Identifying metabolic enzymes with multiple types of association evidence

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases