A comparative evaluation of sequence classification programs

Adam L Bazinet¹, Michael P Cummings

Affiliations

Affiliation

¹ Laboratory of Molecular Evolution, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20874, USA. adam.bazinet@umiacs.umd.edu

PMID: 22574964
PMCID: PMC3428669
DOI: 10.1186/1471-2105-13-92

Comparative Study

A comparative evaluation of sequence classification programs

Adam L Bazinet et al. BMC Bioinformatics. 2012.

. 2012 May 10:13:92.

doi: 10.1186/1471-2105-13-92.

Authors

Adam L Bazinet¹, Michael P Cummings

Affiliation

¹ Laboratory of Molecular Evolution, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20874, USA. adam.bazinet@umiacs.umd.edu

PMID: 22574964
PMCID: PMC3428669
DOI: 10.1186/1471-2105-13-92

Abstract

Background: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for 'barcoding genes' like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis.

Results: We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known.

Conclusions: We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.

PubMed Disclaimer

Figures

**Figure 1**
**Program clustering.** A neighbor-joining tree that clusters the classification programs based on their similar attributes.

See this image and copyright information in PMC

References

1. Benson DA, Karsch-Mizrachi I, Clark K, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2012;40(Database issue):D48–53. - PMC - PubMed
1. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer ELL, Eddy SR, Bateman A. The Pfam protein families database. Nucleic Acids Res. 2010;38(suppl 1):D211–D222. - PMC - PubMed
1. Kislyuk A, Bhatnagar S, Dushoff J, Weitz JS. Unsupervised statistical clustering of environmental shotgun sequences. BMC Bioinf. 2009;10:316. doi: 10.1186/1471-2105-10-316. - DOI - PMC - PubMed
1. Chatterji S, Yamazaki I, Bai Z, Eisen J. Proceedings of the 12th annual international conference on Research in computational molecular biology, RECOMB’08. Springer-Verlag, Berlin, Heidelberg; 2008. CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads; pp. 17–28.
1. Kelley D, Salzberg S. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinf. 2010;11:544. doi: 10.1186/1471-2105-11-544. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparative evaluation of sequence classification programs

Affiliation

A comparative evaluation of sequence classification programs

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources