Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 9:14:207.
doi: 10.1186/s12862-014-0207-y.

Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam

Affiliations

Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam

Juanjuan Chai et al. BMC Evol Biol. .

Abstract

Background: Phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes.

Results: A total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accurate comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history.

Conclusions: Our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the prokaryotic genomes. The phylogenetic tree contains 14,727 genomes with tips colored according to their phylum classification. Rings: (1) Phylum classification of the genomes. Phyla with less than 5 genomes are in the “Others” category. (2) Completion status of the genomes with black for finished genomes and white for draft genomes. (3) Number of contigs in each genome. (4) Number of proteins in each genome. (5) Percentage of proteins annotated by UniFam in each genome. (6) Number of pathways inferred for each genome.
Figure 2
Figure 2
Hierarchical clustering of genera and metabolic pathways. The heatmap represents the presence (red) and absence (green) of all 706 pathways (columns) in all 1206 genera (rows). The dendrograms to the left and on the top of the heatmap represent the clustering results of the genera and the pathways, respectively. Higher taxonomic classifications of the genera are marked on the two colored strips: the Bacteria/Archaea domain classification on the left strip and the phylum classification on the right trip.
Figure 3
Figure 3
Distribution of selected pathways across the prokaryotic genomes. Each ring represents the presence pattern (colored) of a pathway on the phylogenetic tree tips. The pathways and their ring colors are listed in the legend.
Figure 4
Figure 4
Distributions of parsimony scores and occurrence frequencies of all pathways. The pathways are classified into consistent pathways with RI > 0.9 (colored in blue) and inconsistent pathways with RI < 0.7 (colored in red).
Figure 5
Figure 5
Subtrees of selected pathways. The subtree of a pathway is reduced from the clocked phylogenetic tree of all genomes by collapsing the entire clades without this pathway into tips. The root is colored red for the pathway’s presence and blue for its absence. The colors of non-root nodes mark the pathway’s status changes from their immediate ancestral nodes: red for gains, blue for losses, and none for no change. The branches descending from nodes containing the pathway are colored green. The total number of blue and red nodes in a pathway’s subtree equals the parsimony score of the pathway. (A) Aerobic respiration (cytochrome C) with parsimony score 290 from 9383 genomes. (B) Methylotrophy with parsimony score 592 from 1353 genomes. (C) Phosphate acquisition with parsimony score 528 from 6005 genomes. (D) Nitrogen fixation with parsimony score 409 from 1121 genomes. (E) Arsenate detoxification with parsimony score 196 from 2182 genomes. (F) Mercury detoxification with parsimony score 944 from 2319 genomes. (G) Isopenicillin N biosynthesis with parsimony score 387 from 2236 genomes. (H) Lysine biosynthesis I with parsimony score 149 from 8372 genomes.

References

    1. Kanehisa M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. - DOI - PMC - PubMed
    1. Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2012;40(Database issue):D742–D753. doi: 10.1093/nar/gkr1014. - DOI - PMC - PubMed
    1. Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28:977–982. doi: 10.1038/nbt.1672. - DOI - PubMed
    1. Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D’Eustachio P, Stein L. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 2011;39(Database issue):D691–D697. doi: 10.1093/nar/gkq1018. - DOI - PMC - PubMed
    1. Schellenberger J, Park JO, Conrad TM, Palsson BØ: BiGG: a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions.BMC Bioinformatics 2010, 11:213. - PMC - PubMed

Publication types

LinkOut - more resources