Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 15;35(20):4162-4164.
doi: 10.1093/bioinformatics/btz188.

GToTree: a user-friendly workflow for phylogenomics

Affiliations

GToTree: a user-friendly workflow for phylogenomics

Michael D Lee. Bioinformatics. .

Abstract

Summary: Genome-level evolutionary inference (i.e. phylogenomics) is becoming an increasingly essential step in many biologists' work. Accordingly, there are several tools available for the major steps in a phylogenomics workflow. But for the biologist whose main focus is not bioinformatics, much of the computational work required-such as accessing genomic data on large scales, integrating genomes from different file formats, performing required filtering, stitching different tools together etc.-can be prohibitive. Here I introduce GToTree, a command-line tool that can take any combination of fasta files, GenBank files and/or NCBI assembly accessions as input and outputs an alignment file, estimates of genome completeness and redundancy, and a phylogenomic tree based on a specified single-copy gene (SCG) set. Although GToTree can work with any custom hidden Markov Models (HMMs), also included are 13 newly generated SCG-set HMMs for different lineages and levels of resolution, built based on searches of ∼12 000 bacterial and archaeal high-quality genomes. GToTree aims to give more researchers the capability to make phylogenomic trees.

Availability and implementation: GToTree is open-source and freely available for download from: github.com/AstrobioMike/GToTree. It is implemented primarily in bash with helper scripts written in python.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of general workflow and an example Tree of Life made with GToTree encompassing ∼1700 genomes from NCBI’s RefSeq using a universal SCG-set (Hug et al., 2016)

Comment in

References

    1. Berube P.M. et al. (2018) Single cell genomes of Prochlorococcus, Synechococcus, and sympatric microbes from diverse marine environments. Sci. Data, 5, 180154. - PMC - PubMed
    1. Braakman R. et al. (2017) Metabolic evolution and the self-organization of ecosystems. Proc. Natl. Acad. Sci USA, 114, E3091–E3100. - PMC - PubMed
    1. Eddy S.R. (2011) Accelerated profile HMM searches. PLoS Comput. Biol., 7, - PMC - PubMed
    1. Edgar R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5, 113.. - PMC - PubMed
    1. El-Gebali S. et al. (2019) The Pfam protein families database in 2019. Nucleic Acid Res., 47, D427–D432. - PMC - PubMed

Publication types