Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(4):e34030.
doi: 10.1371/journal.pone.0034030. Epub 2012 Apr 4.

Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin

Affiliations

Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin

Vineet K Sharma et al. PLoS One. 2012.

Abstract

Taxonomic assignment of sequence reads is a challenging task in metagenomic data analysis, for which the present methods mainly use either composition- or homology-based approaches. Though the homology-based methods are more sensitive and accurate, they suffer primarily due to the time needed to generate the Blast alignments. We developed the MetaBin program and web server for better homology-based taxonomic assignments using an ORF-based approach. By implementing Blat as the faster alignment method in place of Blastx, the analysis time has been reduced by severalfold. It is benchmarked using both simulated and real metagenomic datasets, and can be used for both single and paired-end sequence reads of varying lengths (≥45 bp). To our knowledge, MetaBin is the only available program that can be used for the taxonomic binning of short reads (<100 bp) with high accuracy and high sensitivity using a homology-based approach. The MetaBin web server can be used to carry out the taxonomic analysis, by either submitting reads or Blastx output. It provides several options including construction of taxonomic trees, creation of a composition chart, functional analysis using COGs, and comparative analysis of multiple metagenomic datasets. MetaBin web server and a standalone version for high-throughput analysis are available freely at http://metabin.riken.jp/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. ORF-based approach for the taxonomic assignment of reads of different lengths derived from different regions of the genomic DNA.
Read derived from intergenic region (A), read containing the small 5′ region of an ORF (B), read containing two partial ORFs at the 5′and 3′ terminals and a complete ORF in the middle (C), read containing only a single complete ORF (D), read containing a long partial ORF at one end (E), read obtained from within an ORF (F), read with sequencing error causing a single ORF to split into two smaller ORFs (G). X, Y, Z, K, L, and M are the genomes to which the ORFs showed matches. The taxonomic IDs of the species of these genomes are used for making the taxonomic assignments, and for creating the taxonomic bins.
Figure 2
Figure 2. Flowchart of MetaBin algorithm.
ID and POS refer to %Identity and %Positives, respectively, as provided in the Blastx or Blat output. COV refers to the % coverage of the query with the hit (reference protein).

References

    1. Tringe SG, Rubin EM. Metagenomics: DNA sequencing of environmental samples. Nat Rev Genet. 2005;6:805–814. - PubMed
    1. McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4:63–72. - PubMed
    1. Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner FO. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics. 2004;5:163.:163. - PMC - PubMed
    1. Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10:56.:56. - PMC - PubMed
    1. Rosen GL, Reichenberger ER, Rosenfeld AM. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics. 2011;27:127–129. - PMC - PubMed

Publication types