Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 29:9:429-35.
doi: 10.4137/EBO.S12813. eCollection 2013.

PhyloTreePruner: A Phylogenetic Tree-Based Approach for Selection of Orthologous Sequences for Phylogenomics

Affiliations

PhyloTreePruner: A Phylogenetic Tree-Based Approach for Selection of Orthologous Sequences for Phylogenomics

Kevin M Kocot et al. Evol Bioinform Online. .

Abstract

Molecular phylogenetics relies on accurate identification of orthologous sequences among the taxa of interest. Most orthology inference programs available for use in phylogenomics rely on small sets of pre-defined orthologs from model organisms or phenetic approaches such as all-versus-all sequence comparisons followed by Markov graph-based clustering. Such approaches have high sensitivity but may erroneously include paralogous sequences. We developed PhyloTreePruner, a software utility that uses a phylogenetic approach to refine orthology inferences made using phenetic methods. PhyloTreePruner checks single-gene trees for evidence of paralogy and generates a new alignment for each group containing only sequences inferred to be orthologs. Importantly, PhyloTreePruner takes into account support values on the tree and avoids unnecessarily deleting sequences in cases where a weakly supported tree topology incorrectly indicates paralogy. A test of PhyloTreePruner on a dataset generated from 11 completely sequenced arthropod genomes identified 2,027 orthologous groups sampled for all taxa. Phylogenetic analysis of the concatenated supermatrix yielded a generally well-supported topology that was consistent with the current understanding of arthropod phylogeny. PhyloTreePruner is freely available from http://sourceforge.net/projects/phylotreepruner/.

Keywords: gene tree; orthology; paralogy; phylogenomic.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the PhyloTreePruner tree-pruning algorithm. (A) PhyloTreePruner reads the single-gene tree and corresponding alignment file. (B) Nodes in the single-gene tree with support values below the user-defined threshold are identified (red box) and (C) collapsed into polytomies (green box). (D) PhyloTreePruner identifies the maximally inclusive subtree in which all taxa are represented by exactly one sequence, or, if there is more than one sequence from a taxon, these sequences form a monophyletic clade or are part of the same polytomy (green box). In this example, PhyloTreePruner identifies a potential paralogy issue with the Ixodes sequences (red box). This example shows the necessity of correct single-gene tree rooting. (E) PhyloTreePruner deletes sequences inferred to be paralogs from the tree and the corresponding sequence alignment file (red boxes). (F) In cases where more than one sequence remains from the same taxon, PhyloTreePruner selects the longest sequence and deletes all others (green boxes). This step can be skipped if preferred and another method (eg, SCaFoS) can be used to select the best sequence for each taxon.
Figure 2
Figure 2
Example of a single-gene tree showing a weakly supported node (red box) that incorrectly recovers two sequences from the same taxon as paralogs. PhyloTreePruner collapses nodes with support values below a user-defined threshold and allows sequences from multiple taxa to be part of the same polytomy. Thus, PhyloTreePruner would “rescue” this group from being discarded if a minimum support value above 21 was used.
Figure 3
Figure 3
Phylogram of the most likely tree recovered in the RAxML analysis of the concatenated data matrix. The tick Ixodes was used to root the tree. Bootstrap support values above 50 are shown at each node. Scale bar = 0.05 substitutions per site. Notably, bootstrap support for Paraneoptera (Acyrthosiphon + Pediculus) was weak, consistent with the results of other phylogenomic studies.,

References

    1. Telford MJ. Phylogenomics. Curr Biol. 2007;17(22):R945–6. - PubMed
    1. Telford MJ. Resolving animal phylogeny: a sledgehammer for a tough nut? Dev Cell. 2008;14(4):457–9. - PubMed
    1. Dunn CW, Hejnol A, Matus DQ, et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008;452(7188):745–9. - PubMed
    1. Hejnol A, Obst M, Stamatakis A, et al. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc Biol Sci. 2009;276(1677):4261–70. - PMC - PubMed
    1. Struck TH, Paul C, Hill N, et al. Phylogenomic analyses unravel annelid evolution. Nature. 2011;471(7336):95–8. - PubMed

LinkOut - more resources