Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Mar 29:7:177.
doi: 10.1186/1471-2105-7-177.

Identifying metabolic enzymes with multiple types of association evidence

Affiliations
Comparative Study

Identifying metabolic enzymes with multiple types of association evidence

Peter Kharchenko et al. BMC Bioinformatics. .

Abstract

Background: Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes.

Results: We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases.

Conclusion: We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities.

PubMed Disclaimer

Figures

Figure 1
Figure 1
a. Illustration of the missing gene problem. Metabolic network neighborhood of a missing metabolic enzyme is shown. The neighborhood comprises layers with increasing radii (3 layers shown, indicated by shading). Majority of the enzyme-encoding genes in the neighborhood are known. b. Illustration of the self-rank validation test. Ability to predict known enzyme-encoding genes is tested by measuring its self-rank - the rank of a true enzyme-encoding gene in the candidate set. The candidates are ordered according to overall strength of their functional association with the metabolic network neighborhood of the enzyme. The overall association strength is a combination of layer association scores that measure strength of functional association of the candidate gene with known enzyme-encoding genes in a single layer of the metabolic neighborhood (3 layers, as illustrated in a.). The candidate set contains all genes that are not already part of the metabolic network.
Figure 2
Figure 2
Performance of different phylogenetic profile datasets and corrections. The predictive performance of the algorithm is illustrated by showing the fraction of known enzyme-encoding genes (x axis) predicted within different self-rank thresholds (y axis). For instance, dashed performance curve in subfigure a. (BLAST:xHG) shows that 30% of the test enzymes appear within the top 10 (out of 3352) candidates for their enzymatic function. a. Algorithm performance in predicting known E. coli metabolic enzymes based on the phylogenetic profile associations with the 1st layer of the metabolic network neighborhood. Performance of a regular hypergeometric distribution is shown (HG), together with extended hypergeometric (xHG) and folding (xHG+folding) corrections. The scores are calculated on the BLAST-based dataset. b. The self-rank performance of the 1st layer phylogenetic profile score, calculated using extended hypergeometric distribution with folding is shown for BLAST-based, KEGG-based and COG orthology datasets. The performance of the COG orthology dataset is corrected for the metabolic gene coverage bias.
Figure 3
Figure 3
Comparison of ADT and DLR methods for combining multiple association evidence types. Fraction of enzymes predicted within different self-rank thresholds is shown for E. coli and S. cerevisiae metabolic enzymes. Predictions are based on the combined association evidence (see Methods, Table 1), using two different methods: DLR (dashed curves), and ADT (solid curves).
Figure 4
Figure 4
Enzyme predictions based on individual and combined types of association evidence (see Methods, Table 1). Fraction of known enzymes predicted within different self-rank thresholds is shown for a. E. coli metabolism and b. S. cerevisiae metabolism. Each curve indicates a probability (y axis) with which a true enzyme-encoding gene will be predicted within top n (x axis) candidates for its enzymatic function. The total number of candidates is 3352 for E. coli and 5253 for S. cerevisiae. Different curves demonstrate predictive performance of various types of association evidence. Predictions are generated based on functional association with the first three layers of the metabolic network neighborhood, using ADT classifier with 10-fold validation.

Similar articles

Cited by

References

    1. Borodina I, Krabben P, Nielsen J. Genome-scale analysis of Streptomyces coelicolor A3(2) metabolism. Genome Res. 2005;15:820–829. doi: 10.1101/gr.3364705. - DOI - PMC - PubMed
    1. Reed JL, Vo TD, Schilling CH, Palsson BO. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) Genome Biol. 2003;4:R54. doi: 10.1186/gb-2003-4-9-r54. - DOI - PMC - PubMed
    1. Tatusov RL, Mushegian AR, Bork P, Brown NP, Hayes WS, Borodovsky M, Rudd KE, Koonin EV. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol. 1996;6:279–291. doi: 10.1016/S0960-9822(02)00478-5. - DOI - PubMed
    1. Osterman A, Overbeek R. Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol. 2003;7:238–251. doi: 10.1016/S1367-5931(03)00027-9. - DOI - PubMed
    1. Cordwell SJ. Microbial genomes and "missing" enzymes: redefining biochemical pathways. Arch Microbiol. 1999;172:269–279. doi: 10.1007/s002030050780. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources