Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004;5(5):R32.
doi: 10.1186/gb-2004-5-5-r32. Epub 2004 Apr 27.

Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns

Affiliations

Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns

Galina V Glazko et al. Genome Biol. 2004.

Abstract

Background: Phyletic patterns denote the presence and absence of orthologous genes in completely sequenced genomes and are used to infer functional links between genes, on the assumption that genes involved in the same pathway or functional system are co-inherited by the same set of genomes. However, this basic premise has not been quantitatively tested, and the limits of applicability of the phyletic-pattern method remain unknown.

Results: We characterized a hierarchy of 3,688 phyletic patterns encompassing more than 5,000 known protein-coding genes from 66 complete microbial genomes, using different distances, clustering algorithms, and measures of cluster quality. The most sensitive set of parameters recovered 223 clusters, each consisting of genes that belong to the same metabolic pathway or functional system. Fifty-six clusters included unexpected genes with plausible functional links to the rest of the cluster. Only a small percentage of known pathways and multiprotein complexes are co-inherited as one cluster; most are split into many clusters, indicating that gene loss and displacement has occurred in the evolution of most pathways.

Conclusions: Phyletic patterns of functionally linked genes are perturbed by differential gains, losses and displacements of orthologous genes in different species, reflecting the high plasticity of microbial genomes. Groups of genes that are co-inherited can, however, be recovered by hierarchical clustering, and may represent elementary functional modules of cellular metabolism. The phyletic patterns approach alone can confidently predict the functional linkages for about 24% of the entire data set.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phyletic patterns are corrupted by gene gains and losses. The consensus phylogenetic tree on top is the species' tree based on genomic content [26]. Small black and white squares indicate, respectively, presences and absences of genes in each species. (a) TCA cycle. Blue box indicates the 'canonical' cycle, as known from saprophytic Enterobacteriaceae with large genomes. (b) Glycolysis. The green box indicates omnipresent COGs in the evolutionarily ancient bottom part of glycolysis, and the red box indicates three COGs coding for phosphoglycerate mutase activity. None of the patterns in the red box is close to the patterns in the green box, even though all these COGs are functionally linked. (c) Most genomes have just one of the two types of thymidylate synthase, but the blue boxes indicate several exceptions to this rule. (d) The full names of the species listed along the top of (a) and the TCA enzymes corresponding to the COGs shown in (a-c).
Figure 2
Figure 2
Groups of phyletic patterns and COGs revealed by hierarchical clustering of patterns in species space. The presentation is similar to Figure 1, but the black and white squares are vertically compressed in order to show all 4,589 COGs in one figure. The full tree of COGs is shown at the left; at 170 COGs per 1 mm height, it is not particularly suitable for visual consumption, but some closely linked clusters (short branches) can be discerned.
Figure 3
Figure 3
Comparison of distance measures and clustering algorithms. (a) Diametric distance combined with NJ clustering results in the highest sensitivity and the smallest percentage of lost data. (b) The effect of selected distance measures between phyletic patterns on the recovery of functionally linked pairs of genes. The criteria of functional linkages on the basis of the KEGG maps, as well as the values for mutual information are as in [28].
Figure 4
Figure 4
Fragmentation of riboflavin biosynthesis. (a) PP-cluster 211 contains the volatile part of riboflavin biosynthesis that is mostly missing in archaea (COGs 0307, 0117, 0807). (b) PP-cluster 220 contains the evolutionary most conservative part of the pathway (COGs 1985, 0108, 0054). Gray shading indicates enzymes in PP-cluster 220, unrelated to riboflavin biosynthesis.

References

    1. Fitch WM. Homology: a personal view on some of the problems. Trends Genet. 2000;16:227–231. doi: 10.1016/S0168-9525(00)02005-9. - DOI - PubMed
    1. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. doi: 10.1126/science.278.5338.631. - DOI - PubMed
    1. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96:4285–4288. doi: 10.1073/pnas.96.8.4285. - DOI - PMC - PubMed
    1. Smit A, Mushegian A. Biosynthesis of isoprenoids via mevalonate in Archaea: the lost pathway. Genome Res. 2000;10:1468–1484. doi: 10.1101/gr.145600. - DOI - PubMed
    1. Kaneda K, Kuzuyama T, Takagi M, Hayakawa Y, Seto H. An unusual isopentenyl diphosphate isomerase found in the mevalonate pathway gene cluster from Streptomyces sp. strain CL190. Proc Natl Acad Sci USA. 2001;98:932–937. doi: 10.1073/pnas.020472198. - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources