Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 25;7(1):9440.
doi: 10.1038/s41598-017-09947-y.

EnSVMB: Metagenomics Fragments Classification using Ensemble SVM and BLAST

Affiliations

EnSVMB: Metagenomics Fragments Classification using Ensemble SVM and BLAST

Yuan Jiang et al. Sci Rep. .

Abstract

Metagenomics brings in new discoveries and insights into the uncultured microbial world. One fundamental task in metagenomics analysis is to determine the taxonomy of raw sequence fragments. Modern sequencing technologies produce relatively short fragments and greatly increase the number of fragments, and thus make the taxonomic classification considerably more difficult than before. Therefore, fast and accurate techniques are called to classify large-scale fragments. We propose EnSVM (Ensemble Support Vector Machine) and its advanced method called EnSVMB (EnSVM with BLAST) to accurately classify fragments. EnSVM divides fragments into a large confident (or small diffident) set, based on whether the fragments get consistent (or inconsistent) predictions from linear SVMs trained with different k-mers. Empirical study shows that sensitivity and specificity of EnSVM on confident set are higher than 90% and 97%, but on diffident set are lower than 60% and 75%. To further improve the performance on diffident set, EnSVMB takes advantage of best hits of BLAST to reclassify fragments in that set. Experimental results show EnSVM can efficiently and effectively divide fragments into confident and diffident sets, and EnSVMB achieves higher accuracy, sensitivity and more true positives than related state-of-the-art methods and holds comparable specificity with the best of them.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
The performance of six methods under different lengths of fragments. Particularly, EnSVMB(vote = 3), EnSVMB(vote = 4) and EnSVMB(vote = 5) means that the voting threshold of EnSVMB is set as 3, 4 and 5, respectively.
Figure 2
Figure 2
The performance of six methods on large-scale dataset.
Figure 3
Figure 3
Abundance profiles identified by BWA, BLAST, EnSVMB and NBC. ‘Providers’ means that the abundance profiles are taken from EBI (https://www.ebi.ac.uk/metagenomics/).
Figure 4
Figure 4
Five linear SVMs are integrated into an ensemble classifier (EnSVM). EnSVM then divides fragments in the validation set into the confident and diffident sets based on the aggregated predictions from these SVMs. The voting threshold (labeled as vote) is adjustable. EnSVMB further applies BLAST to reclassify fragments in the diffident set and tags fragments can not be retrieved from the reference set with confident e-value as unknown.

References

    1. Hugenholtz P. Exploring prokaryotic diversity in the genomic era. Genome Biology. 2002;3 doi: 10.1186/gb-2002-3-2-reviews0003. - DOI - PMC - PubMed
    1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Tringe SG, et al. Comparative metagenomics of microbial communities. Science. 2005;308:554–557. doi: 10.1126/science.1107851. - DOI - PubMed
    1. Tito RY, et al. Phylotyping and functional analysis of two ancient human microbiomes. PLoS One. 2008;3 doi: 10.1371/journal.pone.0003703. - DOI - PMC - PubMed
    1. Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Research. 2007;17:377–386. doi: 10.1101/gr.5969107. - DOI - PMC - PubMed

Publication types