PhyloFisher: A phylogenomic package for resolving eukaryotic relationships

Alexander K Tice^{1

2}, David Žihala³, Tomáš Pánek^{1

3}, Robert E Jones^{1

2}, Eric D Salomaki⁴, Serafim Nenarokov⁴, Fabien Burki^{5

6}, Marek Eliáš³, Laura Eme⁷, Andrew J Roger⁸, Antonis Rokas⁹, Xing-Xing Shen¹⁰, Jürgen F H Strassert^{5

11}, Martin Kolísko^{4

12}, Matthew W Brown^{1

2}

Affiliations

¹ Department of Biological Sciences, Mississippi State University, Mississippi State, Mississippi, United States of America.
² Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America.
³ Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic.
⁴ Institute of Parasitology, Biology Centre Czech Academy of Sciences, České Budějovice, Czech Republic.
⁵ Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
⁶ Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
⁷ Unité d'Ecologie, Systématique et Evolution, CNRS, Université Paris-Saclay, Paris, France.
⁸ Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada.
⁹ Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America.
¹⁰ State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China.
¹¹ Leibniz Institute of Freshwater Ecology and Inland Fisheries, Ecosystem Research, Berlin, Germany.
¹² Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic.

PMID: 34358228
PMCID: PMC8345874
DOI: 10.1371/journal.pbio.3001365

PhyloFisher: A phylogenomic package for resolving eukaryotic relationships

Alexander K Tice et al. PLoS Biol. 2021.

. 2021 Aug 6;19(8):e3001365.

doi: 10.1371/journal.pbio.3001365. eCollection 2021 Aug.

Authors

Affiliations

¹ Department of Biological Sciences, Mississippi State University, Mississippi State, Mississippi, United States of America.
² Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America.
³ Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic.
⁴ Institute of Parasitology, Biology Centre Czech Academy of Sciences, České Budějovice, Czech Republic.
⁵ Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
⁶ Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
⁷ Unité d'Ecologie, Systématique et Evolution, CNRS, Université Paris-Saclay, Paris, France.
⁸ Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada.
⁹ Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America.
¹⁰ State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China.
¹¹ Leibniz Institute of Freshwater Ecology and Inland Fisheries, Ecosystem Research, Berlin, Germany.
¹² Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic.

PMID: 34358228
PMCID: PMC8345874
DOI: 10.1371/journal.pbio.3001365

Abstract

Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of the PhyloFisher workflow and package contents.**
The PhyloFisher package consists of a manually curated starting database of 240 protein-coding genes and their paralogs from 304 eukaryotic taxa; a series of tools to perform the essential steps of phylogenomic dataset construction (homolog collection, single-protein tree construction, removal of paralogs and contaminants, and matrix concatenation); and many pre- and post-construction analyses necessary for a publication-quality phylogenomic study.

**Fig 2. Flowchart of homolog collection performed by the PhyloFisher Python script *fisher*.py.**
Briefly, each predicted proteome of a new taxon to be added is processed through either a default route or a phylogenetically aware route that utilizes the manually curated orthologs from closely related taxa chosen by the user (and present in the starting database) as search queries against the proteome of the new taxon. Up to a user-defined number of collected sequences are reprioritized or eliminated based on a set of criteria designed to maximize correct demarcation of the desired ortholog and related paralogs while avoiding contaminating sequences. See Supporting information Materials and methods for a detailed description of the logic, third-party software, and associated parameters utilized.

**Fig 3. Phylogenetic tree for 304 eukaryotes, inferred from 240 proteins.**
The tree was inferred using ML (LG+G4+F+C60-PMSF model, with an LG+G4+F+C20 ML tree as a PMSF guide input tree) in IQ-TREE v1.6.7.1 [14]. Single-protein alignments were processed with the PhyloFisher utility *matrix_constructor*.py. See Materials and methods for details. The numbers on branches show support values from 350 ML bootstrap replicates. All nodes are fully supported (100% MLBS) unless otherwise shown. Highly supported clades of high taxonomic level have been collapsed; the full ML tree is available as Fig A in S1 Text. Taxon details are available in S1 Table. This tree was inferred from the full concatenated alignment (72,632 sites). Further detail into the methodology may be found in the Materials and methods and S1 Text. ML, maximum likelihood; MLBS, maximum likelihood bootstrap support; PMSF, posterior mean site frequency.

**Fig 4. Phylogenetic reconstruction of the tree of Saccharomycetaceae using 4 different datasets.**
ML trees (top row) were collected from [20] in A and built using LG+G4+F+C60-PMSF model, with an LG+G4+F+C20 ML tree as a PMSF guide input tree in IQ-TREE v1.6.7.1 [36] for B, C, and D. Gene tree coalescence trees (bottom row) were collected from [20] in A and built using *astral_runner*.py, which employs ASTRAL-III [9]. The corresponding dataset a column of trees is derived from is shown across the top of the figure. Sub-clades that make up the Saccharomycetaceae are shown in dark blue (comprised of AEKL, SNKN, TYV, and ZTZ clades), while the outgroup clades of the Saccharomycodaceae and the Phaffomycetaceae are shown in dark green and cyan, respectively (labeled S and P, respectively). To the right of each Saccharomycetaceae clade is an abbreviation made up of the first letter of each genus included in the clade. Full genus names are written out to the right of the upper left ML tree. Nodes are maximally supported (100 MLBS or 1.0 LPP) unless otherwise shown. The full tree of the PhyloFisher 208 dataset is available in the Supporting information (Fig Y in S1 Text). LPP, local posterior probability; ML, maximum likelihood; MLBS, maximum likelihood bootstrap support; PMSF, posterior mean site frequency.

See this image and copyright information in PMC

References

1. Leipe DD, Gunderson JH, Nerad TA, Sogin ML. Small subunit ribosomal RNA+ of Hexamita inflata and the quest for the first branch in the eukaryotic tree. Mol Biochem Parasitol. 1993;59:41–48. doi: 10.1016/0166-6851(93)90005-i - DOI - PubMed
1. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF. A Kingdom-Level Phylogeny of Eukaryotes Based on Combined Protein Data. Science. 2000;290:972. doi: 10.1126/science.290.5493.972 - DOI - PubMed
1. Brown MW, Heiss AA, Kamikawa R, Inagaki Y, Yabuki A, Tice AK, et al.. Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group. Genome Biol Evol. 2018;10:427–433. doi: 10.1093/gbe/evy014 - DOI - PMC - PubMed
1. Strassert JFH, Jamy M, Mylnikov AP, Tikhonenkov DV, Burki F. New Phylogenomic Analysis of the Enigmatic Phylum Telonemia Further Resolves the Eukaryote Tree of Life. Mol Biol Evol. 2019;36:757–765. doi: 10.1093/molbev/msz012 - DOI - PMC - PubMed
1. Lax G, Eglit Y, Eme L, Bertrand EM, Roger AJ, Simpson AGB. Hemimastigophora is a novel supra-kingdom-level lineage of eukaryotes. Nature. 2018;564:410–414. doi: 10.1038/s41586-018-0708-8 - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PhyloFisher: A phylogenomic package for resolving eukaryotic relationships

Affiliations

PhyloFisher: A phylogenomic package for resolving eukaryotic relationships

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials