Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 15;36(6):1925-1927.
doi: 10.1093/bioinformatics/btz848. Online ahead of print.

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

Affiliations

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

Pierre-Alain Chaumeil et al. Bioinformatics. .

Abstract

Summary: The GTDB Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB). GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in parallel. Here we demonstrate the accuracy of the GTDB-Tk taxonomic assignments by evaluating its performance on a phylogenetically diverse set of 10,156 bacterial and archaeal metagenome-assembled genomes.

Availability: GTDB-Tk is implemented in Python and licensed under the GNU General Public License v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustrative examples of GTDB-Tk taxonomic assignments. (a) The position of the query genome in the reference tree alone may be sufficient to dictate its taxonomic assignment as in this example where it is necessarily a novel phylum. (b) Query genome represents a novel class within the phylum Actinobacteria. (c) Query genome will be classified as either a novel, basal Escherichia species or a novel genus in the family Enterobacteriaceae depending on its RED value. (d) Aerophobia is the only class within the Aerophobetota phylum and as such the query genome may be classified as the most basal order in Aerophobia, a novel class within the Aerophobetota, or a novel phylum depending on its RED value. (e) ANI is calculated between the query genome and the representative genomes for all Staphylococcus species. The query genome is assigned to the closest Staphylococcus species if the ANI is above the species ANI circumscription radius or is otherwise classified as a novel species.

References

    1. Anantharaman K. et al. (2016) Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun., 7, 13219. - PMC - PubMed
    1. Arkin A.P. et al. (2018) KBase: the United States Department of Energy Systems Biology Knowledgebase. Nat. Biotechnol., 36, 566. - PMC - PubMed
    1. Bowers R.M. et al. (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol., 35, 725–731. - PMC - PubMed
    1. Coil D.A. et al. (2019) Genomes from bacteria associated with the canine oral cavity: a test case for automated genome-based taxonomic assignment. PLoS One, 14, e0214354. - PMC - PubMed
    1. Eddy S.R. (2011) Accelerated profile HMM searches. PLoS Comput. Biol., 7, e1002195. - PMC - PubMed