Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 30;38(23):5315-5316.
doi: 10.1093/bioinformatics/btac672.

GTDB-Tk v2: memory friendly classification with the genome taxonomy database

Affiliations

GTDB-Tk v2: memory friendly classification with the genome taxonomy database

Pierre-Alain Chaumeil et al. Bioinformatics. .

Abstract

Summary: The Genome Taxonomy Database (GTDB) and associated taxonomic classification toolkit (GTDB-Tk) have been widely adopted by the microbiology community. However, the growing size of the GTDB bacterial reference tree has resulted in GTDB-Tk requiring substantial amounts of memory (∼320 GB) which limits its adoption and ease of use. Here, we present an update to GTDB-Tk that uses a divide-and-conquer approach where user genomes are initially placed into a bacterial reference tree with family-level representatives followed by placement into an appropriate class-level subtree comprising species representatives. This substantially reduces the memory requirements of GTDB-Tk while having minimal impact on classification.

Availability and implementation: GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

References

    1. Almeida A. et al. (2021) A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol., 39, 105–114. - PMC - PubMed
    1. Balaban M. et al. (2022) Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour., 22, 1213–1227. - PubMed
    1. Barbera P. et al. (2019) EPA-ng: massively parallel evolutionary placement of genetic sequences. Syst. Biol., 68, 365–369. - PMC - PubMed
    1. Chaumeil P.-A. et al. (2019) GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics, 36, 1925–1927. - PMC - PubMed
    1. Matsen F.A. et al. (2010) Pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11, 538. - PMC - PubMed

Publication types