Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Mar 27:2024.03.22.586347.
doi: 10.1101/2024.03.22.586347.

Taxonize-gb: A tool for filtering GenBank non-redundant databases based on taxonomy

Affiliations

Taxonize-gb: A tool for filtering GenBank non-redundant databases based on taxonomy

Mohamed S Sarhan et al. bioRxiv. .

Abstract

Analyzing taxonomic diversity and identification in diverse ecological samples has become a crucial routine in various research and industrial fields. While DNA barcoding marker-gene approaches were once prevalent, the decreasing costs of next-generation sequencing have made metagenomic shotgun sequencing more popular and feasible. In contrast to DNA-barcoding, metagenomic shotgun sequencing offers possibilities for in-depth characterization of structural and functional diversity. However, analysis of such data is still considered a hurdle due to absence of taxa-specific databases. Here we present taxonize-gb, a command-line software tool to extract GenBank non-redundant nucleotide and protein databases, related to one or more input taxonomy identifier. Our tool allows the creation of taxa-specific reference databases tailored to specific research questions, which reduces search times and therefore represents a practical solution for researchers analyzing large metagenomic data on regular basis. Taxonize-gb is an open-source command-line Python-based tool freely available for installation at https://pypi.org/project/taxonize-gb/ and on GitHub https://github.com/msabrysarhan/taxonize_genbank. It is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare there is no conflict of interest.

Figures

Figure 1.
Figure 1.
Visual workflow for the “taxonize_gb” module for filtering the NCBI non-redundant protein and nucleotide databases.
Figure 2:
Figure 2:. Performance comparison in terms of the runtimes (h) of DIAMOND search against taxonized NCBI-nr vs complete NCBI-nr with restricted TaxID search option.
For further information on the used metagenomic samples, please refer to Maixner et al. (2021). The data are publicly available at ENA: PRJEB44507.

References

    1. Rishan S.T., Kline R.J., Rahman M.S.J.E.A. (2023) Applications of environmental DNA (eDNA) to detect subterranean and aquatic invasive species: A critical review on the challenges and limitations of eDNA metabarcoding. 100370.
    1. Ruppert K.M., Kline R.J., Rahman M.S. (2019) Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA. Global Ecology and Conservation, 17, e00547.
    1. Rodríguez M.d.S.T., Vanhollebeke J., Derycke S.J.F.C. (2023) Evaluation of DNA metabarcoding using Oxford Nanopore sequencing for authentication of mixed seafood products. 145, 109388.
    1. Baksay S., Andalo C., Galop D., et al. (2022) Using Metabarcoding to Investigate the Strength of Plant-Pollinator Interactions From Surveys of Visits to DNA Sequences. 10, 735588.
    1. Van Nynatten A., Gallage K.S., Lujan N.K., et al. (2023) Ichthyoplankton metabarcoding: An efficient tool for early detection of invasive species establishment. - PubMed

Publication types

LinkOut - more resources