Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
- PMID: 17468765
- DOI: 10.1038/nmeth1043
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
Abstract
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.
Comment in
-
Interpreting the unculturable majority.Nat Methods. 2007 Jun;4(6):479-80. doi: 10.1038/nmeth0607-479. Nat Methods. 2007. PMID: 17538628 No abstract available.
Similar articles
-
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.Bioinformatics. 2009 Jul 15;25(14):1722-30. doi: 10.1093/bioinformatics/btp317. Epub 2009 May 13. Bioinformatics. 2009. PMID: 19439565
-
Metagenomics: read length matters.Appl Environ Microbiol. 2008 Mar;74(5):1453-63. doi: 10.1128/AEM.02181-07. Epub 2008 Jan 11. Appl Environ Microbiol. 2008. PMID: 18192407 Free PMC article.
-
nWayComp: a genome-wide sequence comparison tool for multiple strains/species of phylogenetically related microorganisms.In Silico Biol. 2007;7(2):195-200. In Silico Biol. 2007. PMID: 17688445
-
Annotation, comparison and databases for hundreds of bacterial genomes.Res Microbiol. 2007 Dec;158(10):724-36. doi: 10.1016/j.resmic.2007.09.009. Epub 2007 Oct 6. Res Microbiol. 2007. PMID: 18031997 Review.
-
Get the most out of your metagenome: computational analysis of environmental sequence data.Curr Opin Microbiol. 2007 Oct;10(5):490-8. doi: 10.1016/j.mib.2007.09.001. Epub 2007 Oct 23. Curr Opin Microbiol. 2007. PMID: 17936679 Review.
Cited by
-
From bacterial genomics to metagenomics: concept, tools and recent advances.Indian J Microbiol. 2008 Jun;48(2):173-94. doi: 10.1007/s12088-008-0031-4. Epub 2008 Jul 27. Indian J Microbiol. 2008. PMID: 23100712 Free PMC article.
-
Computational tools for viral metagenomics and their application in clinical research.Virology. 2012 Dec 20;434(2):162-74. doi: 10.1016/j.virol.2012.09.025. Epub 2012 Oct 11. Virology. 2012. PMID: 23062738 Free PMC article. Review.
-
Alignment and clustering of phylogenetic markers--implications for microbial diversity studies.BMC Bioinformatics. 2010 Mar 24;11:152. doi: 10.1186/1471-2105-11-152. BMC Bioinformatics. 2010. PMID: 20334679 Free PMC article.
-
An iterative workflow for mining the human intestinal metaproteome.BMC Genomics. 2011 Jan 5;12:6. doi: 10.1186/1471-2164-12-6. BMC Genomics. 2011. PMID: 21208423 Free PMC article.
-
Evaluating the fidelity of de novo short read metagenomic assembly using simulated data.PLoS One. 2011;6(5):e19984. doi: 10.1371/journal.pone.0019984. Epub 2011 May 23. PLoS One. 2011. PMID: 21625384 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials