Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;6(9):673-6.
doi: 10.1038/nmeth.1358. Epub 2009 Aug 2.

Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models

Affiliations

Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models

Arthur Brady et al. Nat Methods. 2009 Sep.

Abstract

Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Percent accuracy of Phymm, with species-level matches masked, for read lengths from 100–1000 bp
Colored dots show classification accuracy reported for PhyloPythia at 1000 bp for genus-through phylum-level predictions, and for CARMA at 100 bp (as apercentage of the entire input data set) for genus-and phylum-level predictions.
Figure 2
Figure 2. PhymmBL’s phylum-level population characterization of the AMD data
using (A) the RefSeq-generated IMMs plus IMMs generated from the draft genomes of the three dominant species in the AMD set, and (B) the RefSeq-generated IMMs on their own.
Figure 3
Figure 3. PhymmBL’s species-level population characterization of the AMD data
using the RefSeq-generated IMMs plus IMMs generated from the draft genomes of the three dominant species in the AMD set.

References

    1. Handelsman J, Tiedje J, Alvarez-Cohen L, et al. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. The National Academies Press; Washington, DC: 2007. - PubMed
    1. Rondon MR, August PR, Bettermann AD, et al. Appl Environ Microbiol. 2000;66(6):2541. - PMC - PubMed
    1. Krause L, Diaz NN, Goesmann A, et al. Nucleic Acids Res. 2008;36(7):2230. - PMC - PubMed
    1. McHardy AC, Martin HG, Tsirigos A, et al. Nat Methods. 2007;4(1):63. - PubMed
    1. Kunin V, Copeland A, Lapidus A, et al. Microbiol Mol Biol Rev. 2008;72(4):557. - PMC - PubMed

Publication types