Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 13;9(10):R151.
doi: 10.1186/gb-2008-9-10-r151.

A simple, fast, and accurate method of phylogenomic inference

Affiliations

A simple, fast, and accurate method of phylogenomic inference

Martin Wu et al. Genome Biol. .

Abstract

The explosive growth of genomic data provides an opportunity to make increased use of protein markers for phylogenetic inference. We have developed an automated pipeline for phylogenomic analysis (AMPHORA) that overcomes the existing bottlenecks limiting large-scale protein phylogenetic inference. We demonstrated its high throughput capabilities and high quality results by constructing a genome tree of 578 bacterial species and by assigning phylotypes to 18,607 protein markers identified in metagenomic data collected from the Sargasso Sea.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A flowchart illustrating the major components of AMPHORA. The marker protein sequences from representative genomes are retrieved, aligned, and masked. Profile hidden Markov models (HMMs) are then built from those 'seed' alignments. New sequences of interest are rapidly and accurately aligned to the trusted seed alignments through HMMs. Predefined masks embedded within the 'seed' alignment are then applied to trim off regions of ambiguity before phylogenetic inference. Alignment columns marked with '1' or '0' were included or excluded, respectively, during further phylogenetic analysis.
Figure 2
Figure 2
An unrooted maximum likelihood bacterial genome tree. The tree was constructed from concatenated protein sequence alignments derived from 31 housekeeping genes. All major phyla are separated into their monophyletic groups and are highlighted by color. The branches with bootstrap support of over 80 (out of 100 replicates) are indicated with black dots. Although the relationships among the phyla are not strongly supported, those below the phylum level show very respectable support. The radial tree was generated using iTOL [42].
Figure 3
Figure 3
Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The breakdown of the phylotyping assignments by markers and major taxonomic groups is listed in Additional data file 5.
Figure 4
Figure 4
Comparison of the phylotyping performance by AMPHORA and MEGAN. The sensitivity and specificity of the phylotyping methods were measured across taxonomic ranks using simulated Sanger shotgun sequences of 31 genes from 100 representative bacterial genomes. The figure shows that AMPHORA significantly outperforms MEGAN in sensitivity without sacrificing specificity.
Figure 5
Figure 5
A tree based bracketing algorithm for phylotyping a query sequence. To assign a phylotype to the query sequence, its immediate ancestor n0 and the first internal node n1 with ≥70% bootstrapping support were identified. The known descendant leaf nodes of n1, namely A through D, are used to infer the taxonomy of the query, in conjunction with the normalized branch length information. The dashed timelines delimiting various taxonomic ranks were inferred from a clock that had been calibrated from the bacterial genome tree.

References

    1. Woese CR, Achenbach L, Rouviere P, Mandelco L. Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts. Syst Appl Microbiol. 1991;14:364–371. - PubMed
    1. Hasegawa M, Hashimoto T. Ribosomal RNA trees misleading? Nature. 1993;361:23. doi: 10.1038/361023b0. - DOI - PubMed
    1. Ludwig W, Klenk H-P. Overview: A phylogenetic backbone and taxonomic framework for procaryotic systematics. In: Boone DR, Castenholz RW, Garrity GM, editor. Bergey's Manual of Systematic Bacteriology. 2. Vol. 1. New York, NY: Springer-Verlag; 2000. pp. 49–65.
    1. Loomis WF, Smith DW. Molecular phylogeny of Dictyostelium discoideum by protein sequence comparison. Proc Natl Acad Sci USA. 1990;87:9093–9097. doi: 10.1073/pnas.87.23.9093. - DOI - PMC - PubMed
    1. Lockhart PJ, Howe CJ, Bryant DA, Beanland TJ, Larkum AW. Substitutional bias confounds inference of cyanelle origins from sequence data. J Mol Evol. 1992;34:153–162. doi: 10.1007/BF00182392. - DOI - PubMed

Publication types