Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 26;41(1):btae743.
doi: 10.1093/bioinformatics/btae743.

The Naïve Bayes classifier++ for metagenomic taxonomic classification-query evaluation

Affiliations

The Naïve Bayes classifier++ for metagenomic taxonomic classification-query evaluation

Haozhe Neil Duan et al. Bioinformatics. .

Abstract

Motivation: This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge.

Results: NBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information.

Availability and implementation: Source code and Dockerfile are available at http://github.com/EESI/Naive_Bayes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The Phylum and Order (over 2% relative abundance) composition of the human sample is shown. Standard and Extended databases have more concordance than Basic and Standard databases.

References

    1. Gourlé H, Karlsson-Lindsjö O, Hayer J. et al. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics 2018;35:521–2. 10.1093/bioinformatics/bty630 - DOI - PMC - PubMed
    1. Lan Y, Wang Q, Cole J. et al. Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One 2012;7:e32491. - PMC - PubMed
    1. Marçais G, Kingsford C.. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 2011;27:764–70. 10.1093/bioinformatics/btr011 - DOI - PMC - PubMed
    1. Marić J, Križanović K, Riondet S. et al. Comparative analysis of metagenomic classifiers for long-read sequencing datasets. BMC Bioinformatics 2024;25:15. 10.1186/s12859-024-05634-8 - DOI - PMC - PubMed
    1. McIntyre ABR, Ounit R, Afshinnekoo E. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol 2017;18:182. 10.1186/s13059-017-1299-7 - DOI - PMC - PubMed