Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 4:9:511.
doi: 10.12688/f1000research.23790.2. eCollection 2020.

How to build phylogenetic species trees with OMA

Affiliations

How to build phylogenetic species trees with OMA

David Dylus et al. F1000Res. .

Abstract

Knowledge of species phylogeny is critical to many fields of biology. In an era of genome data availability, the most common way to make a phylogenetic species tree is by using multiple protein-coding genes, conserved in multiple species. This methodology is composed of several steps: orthology inference, multiple sequence alignment and inference of the phylogeny with dedicated tools. This can be a difficult task, and orthology inference, in particular, is usually computationally intensive and error prone if done ad hoc. This tutorial provides protocols to make use of OMA Orthologous Groups, a set of genes all orthologous to each other, to infer a phylogenetic species tree. It is designed to be user-friendly and computationally inexpensive, by providing two options: (1) Using only precomputed groups with species available on the OMA Browser, or (2) Computing orthologs using OMA Standalone for additional species, with the option of using precomputed orthology relations for those present in OMA. A protocol for downstream analyses is provided as well, including creating a supermatrix, tree inference, and visualization. All protocols use publicly available software, and we provide scripts and code snippets to facilitate data handling. The protocols are accompanied with practical examples.

Keywords: OMA; Orthologous Matrix; phylogenetics; phylogenomics; species tree.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Exporting data from OMA for building a species tree.
A) Choose which type of data to export from the Download tab on the right hand side of the home page. B) Select your proteomes from those in the OMA database by using the interactive species tree, which is based on the NCBI taxonomy.
Figure 2.
Figure 2.. Tree organization of the tarball downloaded through the OMA Browser after exporting an all-against-all of selected species.
The important files and folders are colored. In green, the executable files mentioned in the course of the tutorial. In blue are the files and folder that will need to be modified. Other files and folders (in black) will not be used in the course of the tutorial. Files and folders not shown are represented by three dots.
Figure 3.
Figure 3.. Comparison of phylogenetic trees computed by IQ-TREE, using an LG substitution model (left), and RAxML, using an LG substitution model, a discrete Gamma model of rate heterogeneity with 8 categories, and empirical amino-acid frequencies (right).
Trees were computed with 20 yeast species present in OMA. The leaves of the trees are the UniProt 5-letter species codes. The following export options were used: Minimum species coverage: 1, Maximum nr of markers: -1 (uncapped). 168 marker genes were exported. Visualization was done with phylo.io; different shades of blue show variations in topology. Bootstrap values are reported in red for each bipartition with a bootstrap <100.
Figure 4.
Figure 4.. Comparison of phylogenetic trees, using additional species, computed by IQ-TREE, under a LG substitution model (left), and RAxML, under a LG substitution model, a discrete Gamma model of rate heterogeneity with 8 categories and empirical amino-acid frequencies (right).
Trees were computed with 18 yeast species present in OMA, plus two additional proteomes (YEAST and FOMPI). The leaves of the trees are the UniProt 5-letter species codes. Genes used to compute the tree had to be shared by at least 90% of the species (minimum species coverage: 0.9, maximum number markers: -1). This represents 880 OGs. Visualization was done with phylo.io; different shades of blue show variations in topology (in this case both trees have identical topology). Bootstrap values are reported in red for each bipartition with a bootstrap <100.

References

    1. Hinchliff CE, Smith SA, Allman JF, et al. : Synthesis of Phylogeny and Taxonomy into a Comprehensive Tree of Life. Proc Natl Acad Sci U S A. 2015;112(41):12764–9. 10.1073/pnas.1423041112 - DOI - PMC - PubMed
    1. Lane DJ, Pace B, Olsen GJ, et al. : Rapid Determination of 16S ribosomal RNA Sequences for Phylogenetic Analyses. Proc Natl Acad Sci U S A. 1985;82(20):6955–9. 10.1073/pnas.82.20.6955 - DOI - PMC - PubMed
    1. Maddison WP: Gene Trees in Species Trees. Syst Biol. 1997;46(3):523–36. 10.1093/sysbio/46.3.523 - DOI
    1. Philippe H, Brinkmann H, Lavrov DV, et al. : Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough. PLoS Biol. 2011;9(3):e1000602. 10.1371/journal.pbio.1000602 - DOI - PMC - PubMed
    1. Philippe H, de Vienne DM, Ranwez V, et al. : Pitfalls in supermatrix phylogenomics. EJT. .2017; (283). 10.5852/ejt.2017.283 - DOI

LinkOut - more resources