Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Dec 23;100(26):15428-33.
doi: 10.1073/pnas.2136809100. Epub 2003 Dec 12.

Genome evolution reveals biochemical networks and functional modules

Affiliations

Genome evolution reveals biochemical networks and functional modules

Christian von Mering et al. Proc Natl Acad Sci U S A. .

Abstract

The analysis of completely sequenced genomes uncovers an astonishing variability between species in terms of gene content and order. During genome history, the genes are frequently rear-ranged, duplicated, lost, or transferred horizontally between genomes. These events appear to be stochastic, yet they are under selective constraints resulting from the functional interactions between genes. These genomic constraints form the basis for a variety of techniques that employ systematic genome comparisons to predict functional associations among genes. The most powerful techniques to date are based on conserved gene neighborhood, gene fusion events, and common phylogenetic distributions of gene families. Here we show that these techniques, if integrated quantitatively and applied to a sufficiently large number of genomes, have reached a resolution which allows the characterization of function at a higher level than that of the individual gene: global modularity becomes detectable in a functional protein network. In Escherichia coli, the predicted modules can be bench-marked by comparison to known metabolic pathways. We found as many as 74% of the known metabolic enzymes clustering together in modules, with an average pathway specificity of at least 84%. The modules extend beyond metabolism, and have led to hundreds of reliable functional predictions both at the protein and pathway level. The results indicate that modularity in protein networks is intrinsically encoded in present-day genomes.

PubMed Disclaimer

Figures

Fig. 4.
Fig. 4.
A network of predicted functional modules in E. coli. Only modules of size four or larger are shown. Nodes represent single proteins or groups of highly similar proteins as defined in the COG database. Genomic context links within predicted modules are shown in dark gray, and those across modules are shown in light gray. For clarity, the latter links are limited to those with an association score of 0.650 or higher (on a scale from zero to one; ref. 26). Functional categories are as defined in the COG database. (Inset) A typical example of a largely uncharacterized pathway.
Fig. 1.
Fig. 1.
Correlation between metabolic pathways and genomic context predictions. Metabolic databases such as EcoCyc describe metabolites and enzymes, and subjectively group them into metabolic “pathways.” In contrast, comparative genomics can reveal selective pressures shared by groups of enzymes, thereby defining functional modularity objectively. Surprisingly, a good agreement between both is observed. Note that the purine biosynthesis pathway is covered by two predicted modules, which are separated by a branching point in the pathway. The node marked by an asterisk consists of two enzymes (GuaC and ImdH), which are too closely related to be resolved into separate orthologous groups (32). Both enzymes are involved in purine metabolism, but only ImdH is part of the biosynthesis pathway, so GuaC is counted here as a false positive. [The schematic overview of metabolism is reproduced with permission from ref. (Copyright 1994, Garland Publishing, New York).]
Fig. 2.
Fig. 2.
Clustering genomic context associations: parameter exploration and benchmarking. Shown are graphs summarizing the benchmarking performance of the clustering, considering only clusters containing at least two enzymes. The achievable pathway specificity quickly reaches a plateau at a high level of prediction accuracy, independent of the clustering algorithm used. In contrast, the observed number of the predicted functional modules and the fraction of total metabolism they cover are both somewhat more sensitive to clustering algorithms and cutoffs. The data set marked with an asterisk was chosen for detailed manual analysis and is the basis of all subsequent figures.
Fig. 3.
Fig. 3.
Global properties of the predicted metabolic modules. (A) Functional composition. In addition to the annotated enzymes, the predicted modules often contain putative enzymes that are not yet assigned to pathways, as well as proteins from other functional categories. (B) As expected, metabolic modules are strongly enriched in enzymatic functions, but they also contain other functions, most notably transport and transcription regulation. The categories shown are a subset of the gene-ontology (30) subtree “biological process” (see Data Sources and Procedures for details). (C) Pathway topology and genomic context. The graph shows the scores of all genomic context associations between enzymes that are direct neighbors in metabolism of E. coli. The pathway topology is defined by the number of enzymes metabolizing the same substrate (not considering frequent substrates such as water or ATP, frequency cutoff is 8). Any substrate metabolized by more than two enzymes constitutes a branching point.

Similar articles

Cited by

References

    1. Wolfe, K. H. & Li, W. H. (2003) Nat. Genet. 33, Suppl., 255–265. - PubMed
    1. Koonin, E. V., Makarova, K. S. & Aravind, L. (2001) Annu. Rev. Microbiol. 55, 709–742. - PMC - PubMed
    1. Lawrence, J. G. (1997) Trends Microbiol. 5, 355–359. - PubMed
    1. Aravind, L., Watanabe, H., Lipman, D. J. & Koonin, E. V. (2000) Proc. Natl. Acad. Sci. USA 97, 11319–11324. - PMC - PubMed
    1. Snel, B., Bork, P. & Huynen, M. (2000) Trends Genet. 16, 9–11. - PubMed

Publication types

LinkOut - more resources