Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 24;462(7276):1056-60.
doi: 10.1038/nature08656.

A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

Affiliations

A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

Dongying Wu et al. Nature. .

Abstract

Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms. There are now nearly 1,000 completed bacterial and archaeal genomes available, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, we have sequenced and analysed the genomes of 56 culturable species of Bacteria and Archaea selected to maximize phylogenetic coverage. Analysis of these genomes demonstrated pronounced benefits (compared to an equivalent set of genomes randomly selected from the existing database) in diverse areas including the reconstruction of phylogenetic history, the discovery of new protein families and biological properties, and the prediction of functions for known genes from other organisms. Our results strongly support the need for systematic 'phylogenomic' efforts to compile a phylogeny-driven 'Genomic Encyclopedia of Bacteria and Archaea' in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Maximum-likelihood phylogenetic tree of the bacterial domain based on a concatenated alignment of 31 broadly conserved protein-coding genes. Phyla are distinguished by colour of the branch and GEBA genomes are indicated in red in the outer circle of species names.
Figure 2
Figure 2. Rate of discovery of protein families as a function of phylogenetic breadth of genomes
For each of four groupings (species, different strains of Streptococcus agalactiae; family, Enterobacteriaceae; phylum, Actinobacteria; domain, GEBA bacteria), all proteins from that group were compared to each other to identify protein families. Then the total number of protein families was calculated as genomes were progressively sampled from the group (starting with one genome until all were sampled). This was done multiple times for each of the four groups using random starting seeds; the average and standard deviation were then plotted.
Figure 3
Figure 3. A bacterial homologue of actin
a, Genomic context of the bacterial actin-related protein (BARP) gene within the genome of the marine Delta proteobacterium H. ochraceum. Red, gene encoding BARP; white, genes encoding hypothetical proteins; black, genes with functional annotations. b, RT–PCR demonstration of expression of the gene encoding BARP in H. ochraceum. c, Ribbon plot of the putative structure of BARP. d, Alignment of BARP with actin from Dictyostelium discoideum with similarities in black shaded text. Secondary structure elements (arrows, beta-strands; bars, alpha-helices) are colour-coded as in c. A phylogenetic tree including this protein is in Supplementary Figure 1.
Figure 4
Figure 4. Phylogenetic diversity of bacteria and archaea on the basis of SSU rRNA genes
Using a phylogenetic tree of unique SSU rRNA gene sequences, phylogenetic diversity was measured for four subsets of this tree: organisms with sequenced genomes pre-GEBA (blue), the GEBA organisms (red), all cultured organisms (dark grey), and all available SSU rRNA genes (light grey). For each subtree, taxa were sorted by their contribution to the subtree phylogenetic diversity and the cumulative phylogenetic diversity was plotted from maximal (left) to the least (right). The inset magnifies the first 1,500 organisms. Comparison of the plots shows the phylogenetic ‘dark matter’ left to be sampled.

Comment in

  • Filling the gaps in the genomic landscape.
    Williams D, Gogarten JP, Lapierre P. Williams D, et al. Genome Biol. 2010;11(2):103. doi: 10.1186/gb-2010-11-2-103. Epub 2010 Feb 16. Genome Biol. 2010. PMID: 20210981 Free PMC article.

Similar articles

  • Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes.
    Dos Santos PC, Fang Z, Mason SW, Setubal JC, Dixon R. Dos Santos PC, et al. BMC Genomics. 2012 May 3;13:162. doi: 10.1186/1471-2164-13-162. BMC Genomics. 2012. PMID: 22554235 Free PMC article.
  • Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains.
    Kyrpides NC, Hugenholtz P, Eisen JA, Woyke T, Göker M, Parker CT, Amann R, Beck BJ, Chain PS, Chun J, Colwell RR, Danchin A, Dawyndt P, Dedeurwaerdere T, DeLong EF, Detter JC, De Vos P, Donohue TJ, Dong XZ, Ehrlich DS, Fraser C, Gibbs R, Gilbert J, Gilna P, Glöckner FO, Jansson JK, Keasling JD, Knight R, Labeda D, Lapidus A, Lee JS, Li WJ, Ma J, Markowitz V, Moore ER, Morrison M, Meyer F, Nelson KE, Ohkuma M, Ouzounis CA, Pace N, Parkhill J, Qin N, Rossello-Mora R, Sikorski J, Smith D, Sogin M, Stevens R, Stingl U, Suzuki K, Taylor D, Tiedje JM, Tindall B, Wagner M, Weinstock G, Weissenbach J, White O, Wang J, Zhang L, Zhou YG, Field D, Whitman WB, Garrity GM, Klenk HP. Kyrpides NC, et al. PLoS Biol. 2014 Aug 5;12(8):e1001920. doi: 10.1371/journal.pbio.1001920. eCollection 2014 Aug. PLoS Biol. 2014. PMID: 25093819 Free PMC article.
  • En route to a genome-based classification of Archaea and Bacteria?
    Klenk HP, Göker M. Klenk HP, et al. Syst Appl Microbiol. 2010 Jun;33(4):175-82. doi: 10.1016/j.syapm.2010.03.003. Epub 2010 Apr 20. Syst Appl Microbiol. 2010. PMID: 20409658 Review.
  • Insights into the phylogeny and coding potential of microbial dark matter.
    Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF, Darling A, Malfatti S, Swan BK, Gies EA, Dodsworth JA, Hedlund BP, Tsiamis G, Sievert SM, Liu WT, Eisen JA, Hallam SJ, Kyrpides NC, Stepanauskas R, Rubin EM, Hugenholtz P, Woyke T. Rinke C, et al. Nature. 2013 Jul 25;499(7459):431-7. doi: 10.1038/nature12352. Epub 2013 Jul 14. Nature. 2013. PMID: 23851394
  • [A system for comparative analysis of microbial genomes].
    Uchiyama I. Uchiyama I. Nihon Rinsho. 2003 Mar;61 Suppl 3:441-8. Nihon Rinsho. 2003. PMID: 12718007 Review. Japanese. No abstract available.

Cited by

References

    1. Fraser CM, Eisen JA, Salzberg SL. Microbial genome sequencing. Nature. 2000;406:799–803. - PMC - PubMed
    1. Liolios K, Mavromatis K, Tavernarakis N, Kyrpides NC. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2008;36 (database issue):D475–D479. - PMC - PubMed
    1. Hugenholtz P. Exploring prokaryotic diversity in the genomic era. Genome Biol. 2002;3:REVIEWS0003.1–REVIEWS0003.8. - PMC - PubMed
    1. Eisen JA. Assessing evolutionary relationships among microbes from whole-genome analysis. Curr Opin Microbiol. 2000;3:475–480. - PubMed
    1. Wu D, et al. Complete genome sequence of the aerobic CO-oxidizing thermophile Thermomicrobium roseum. PLoS One. 2009;4:e4207. - PMC - PubMed

Publication types

Associated data