Accurate phylogenetic classification of variable-length DNA fragments
- PMID: 17179938
- DOI: 10.1038/nmeth976
Accurate phylogenetic classification of variable-length DNA fragments
Abstract
Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to new discoveries and insights into the uncultured microbial world. Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem. A taxonomic characterization of metagenomic fragments is required for a deeper understanding of shotgun-sequenced microbial communities, but success has mostly been limited to sequences containing phylogenetic marker genes. Here we present PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms. The method requires no more than 100 kb of training sequence for the creation of accurate models of sample-specific populations and can assign fragments >or=1 kb with high specificity.
Similar articles
-
TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.BMC Bioinformatics. 2009 Feb 11;10:56. doi: 10.1186/1471-2105-10-56. BMC Bioinformatics. 2009. PMID: 19210774 Free PMC article.
-
Classifying short genomic fragments from novel lineages using composition and homology.BMC Bioinformatics. 2011 Aug 9;12:328. doi: 10.1186/1471-2105-12-328. BMC Bioinformatics. 2011. PMID: 21827705 Free PMC article.
-
Binning sequences using very sparse labels within a metagenome.BMC Bioinformatics. 2008 Apr 28;9:215. doi: 10.1186/1471-2105-9-215. BMC Bioinformatics. 2008. PMID: 18442374 Free PMC article.
-
What's in the mix: phylogenetic classification of metagenome sequence samples.Curr Opin Microbiol. 2007 Oct;10(5):499-503. doi: 10.1016/j.mib.2007.08.004. Epub 2007 Oct 22. Curr Opin Microbiol. 2007. PMID: 17933580 Review.
-
Status of genome projects for nonpathogenic bacteria and archaea.Nat Biotechnol. 2000 Oct;18(10):1049-54. doi: 10.1038/80235. Nat Biotechnol. 2000. PMID: 11017041 Review.
Cited by
-
Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes.Front Microbiol. 2016 Apr 12;7:469. doi: 10.3389/fmicb.2016.00469. eCollection 2016. Front Microbiol. 2016. PMID: 27148174 Free PMC article.
-
Metagenomic microbial community profiling using unique clade-specific marker genes.Nat Methods. 2012 Jun 10;9(8):811-4. doi: 10.1038/nmeth.2066. Nat Methods. 2012. PMID: 22688413 Free PMC article.
-
A new vector for identification of prokaryotes and their variable-size genomes.Curr Microbiol. 2013 Jan;66(1):96-101. doi: 10.1007/s00284-012-0246-9. Epub 2012 Oct 9. Curr Microbiol. 2013. PMID: 23053493
-
Computational tools for viral metagenomics and their application in clinical research.Virology. 2012 Dec 20;434(2):162-74. doi: 10.1016/j.virol.2012.09.025. Epub 2012 Oct 11. Virology. 2012. PMID: 23062738 Free PMC article. Review.
-
Metagenome fragment classification using N-mer frequency profiles.Adv Bioinformatics. 2008;2008:205969. doi: 10.1155/2008/205969. Epub 2008 Nov 16. Adv Bioinformatics. 2008. PMID: 19956701 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources