Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 2:6:ecurrents.tol.c24b6054aebf3602748ac042ccc8f2e9.
doi: 10.1371/currents.tol.c24b6054aebf3602748ac042ccc8f2e9.

Building a phylogenomic pipeline for the eukaryotic tree of life - addressing deep phylogenies with genome-scale data

Affiliations

Building a phylogenomic pipeline for the eukaryotic tree of life - addressing deep phylogenies with genome-scale data

Jessica R Grant et al. PLoS Curr. .

Erratum in

Abstract

Background Understanding the evolutionary relationships of all eukaryotes on Earth remains a paramount goal of modern biology, yet analyzing homologous sequences across 1.8 billion years of eukaryotic evolution is challenging. Many existing tools for identifying gene orthologs are inadequate when working with heterogeneous rates of evolution and endosymbiotic/lateral gene transfer. Moreover, genomic-scale sequencing, which was once the domain of large sequencing centers, has advanced to the point where small laboratories can now generate the data needed for phylogenomic studies. This has opened the door for increased taxonomic sampling as individual research groups have the ability to conduct genome-scale projects on their favorite non-model organism. Results Here we present some of the tools developed, and insights gained, as we created a pipeline that combines data-mining from public databases and our own transcriptome data to study the eukaryotic tree of life. The first steps of a phylogenomic pipeline involve choosing taxa and loci, and making decisions about how to handle alleles, paralogs and non-overlapping sequences. Next, orthologs are aligned for analyses including gene tree reconstruction and concatenation for supermatrix approaches. To build our pipeline, we created scripts written in Python that integrate third-party tools with custom methods. As a test case, we present the placement of five amoebae on the eukaryotic tree of life based on analyses of transcriptome data. Our scripts available on GitHUb and may be used as-is for automated analyses of large scale phylogenomics, or adapted for use in other types of studies. Conclusion Analyses on the scale of all eukaryotes present challenges not necessarily found in studies of more closely related organisms. Our approach will be of relevance to others for whom existing third-party tools fail to fully answer desired phylogenetic questions.

PubMed Disclaimer

Figures

Flowchart showing major steps in the pipeline
Flowchart showing major steps in the pipeline
First, the scripts relating to Taxon Objects output orthologs for each gene of interest by starting with sequence data from target taxa and fasta files of orthogs from OrthoMCL. The orthologs are then combined into Gene Objects and a series of refinement steps are performed including removal of ingroup paralogs, alignment with Guidance (Penn et al., 2010) and generation of single gene trees. The outputs are alignments and trees with 1) all paralogs and 2) paralogs removed in preparation for concatenation.
Most likely tree of concatenated post-pipeline alignments
Most likely tree of concatenated post-pipeline alignments
Most likely tree reconstructed using RAxML 7.3.2 with 247 taxa and 15,650 characters (SSU + 238 protein genes). Bold branches have 100% bootstrap support. Tree with support values and branches labeled can be found in the supplemental data (Figure S1).
Details of Amoebozoa clade with new taxa in bold
Details of Amoebozoa clade with new taxa in bold
Close-up from the phylogenetic analysis showing the placement of the newly added taxa (in bold) within the Amoebozoa. Nodes labeled with bootstrap support values except for bold branches which have 100% bootstrap support. See Figure 2 for further notes.
None
Most likely tree reconstructed using RAxML 7.3.2 with 247 taxa and 15,650 characters (SSU + 238 protein genes). Bold branches have 100% bootstrap support, other support values are labeled.

References

    1. Katz LA, Grant JR, Parfrey LW, Gant A, O'Kelly CJ, Anderson OR, Molestina RE, Nerad T: Subulatomonas tetraspora nov. gen. nov. sp. is a member of a previously unrecognized major clade of eukaryotes. Protist 2011, 162(5):762-773. 10.1016/j.protis.2011.05.002 - DOI - PubMed
    1. Zhao S, Burki F, Brate J, Keeling PJ, Klaveness D, Shalchian-Tabrizi K: Collodictyon--an ancient lineage in the tree of eukaryotes. Mol Biol Evol 2012, 29(6):1557-1568. - PMC - PubMed
    1. Kudryavtsev A, Pawlowski J: Squamamoeba japonica n. g. n. sp (Amoebozoa): A Deep-sea Amoeba from the Sea of Japan with a Novel Cell Coat Structure. Protist 2013, 164(1):13-23. - PubMed
    1. Kudryavtsev A, Pawlowski J, Hausmann K: Description of Paramoeba atlantica n. sp (Amoebozoa, Dactylopodida) - a Marine Amoeba from the Eastern Atlantic, with Emendation of the Dactylopodid Families. Acta Protozool 2011, 50(3):239-253.
    1. Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, Simpson AGB, Roger AJ: Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic "supergroups". Proc Natl Acad Sci U S A 2009, 106(10):3859-3864. - PMC - PubMed