Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov;65(6):997-1008.
doi: 10.1093/sysbio/syw037. Epub 2016 Apr 26.

Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices

Affiliations

Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices

Olga Chernomor et al. Syst Biol. 2016 Nov.

Abstract

In phylogenomics the analysis of concatenated gene alignments, the so-called supermatrix, is commonly accompanied by the assumption of partition models. Under such models each gene, or more generally partition, is allowed to evolve under its own evolutionary model. Although partition models provide a more comprehensive analysis of supermatrices, missing data may hamper the tree search algorithms due to the existence of phylogenetic (partial) terraces. Here, we introduce the phylogenetic terrace aware (PTA) data structure for the efficient analysis under partition models. In the presence of missing data PTA exploits (partial) terraces and induced partition trees to save computation time. We show that an implementation of PTA in IQ-TREE leads to a substantial speedup of up to 4.5 and 8 times compared with the standard IQ-TREE and RAxML implementations, respectively. PTA is generally applicable to all types of partition models and common topological rearrangements thus can be employed by all phylogenomic inference software.

Keywords: Maximum likelihood; partial terraces; partition models; phylogenetic terraces; phylogenomic inference.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The species tree T and the two NNI neighboring trees TNNI1 and TNNI2, obtained by NNIs around the central edge e. NNI, nearest neighbor interchange.
Figure 2.
Figure 2.
(a) Three adjacent edges on species tree T and (b) their corresponding edges on partition tree T|Yi.
Figure 3.
Figure 3.
CPU time comparisons of different implementations under (a) EUL, (b) EL-equal, and (c) EL-proportional partition models. Each boxplot shows the distribution of the runtime ratios for 10 runs between a comparing program and the mean runtime of IQ-TREEPTA. Boxes below the horizontal line indicate instances where corresponding program is slower than IQ-TREEPTA. EUL, Edge-Unlinked and EL, Edge-Linked.
Figure 4.
Figure 4.
Log-likelihood comparisons of different implementations under (a) EUL, (b) EL-equal, and (c) EL-proportional partition models. Each boxplot shows the distribution of the log-likelihood differences for 10 runs between a comparing program and the mean log-likelihood of RAxML standard (panels a and b) or IQ-TREE standard (panel c). Boxes below the horizontal line indicate that the corresponding program has a smaller log-likelihood than RAxML mean log-likelihood (panels a and b) or IQ-TREE mean log-likelihood (panel c). RAxML and RAxMLoptU have identical log-likelihoods given the same starting tree. We therefore omit RAxMLoptU in the plot.
Figure A.1.
Figure A.1.
Cases that do not change the topology of the partition tree under the EL models. Each tree is a species tree: before NNI (T, first row) and after NNI (TNNI, second and third rows). The edges and the species sets leading to them are the same as in Figure 1 (e.g., on T the upper left edge is e1 with the species set A and so on). Here, fi(.) of gray colored edges is equal to ϵ and gray triangles correspond to taxa sets, which are absent on the considered partition tree: T|Yi,TNNI1|Yi or TNNI2|Yi. The black colored parts correspond to the topologies of these induced partition trees. The arrows show edges that were swapped during NNI around the central branch e on T.
Figure A.2.
Figure A.2.
General representations of (a) SPR and (b) TBR, where the triangles denote the subtrees below the corresponding edges. SPR, subtree pruning and regrafting, TBR, tree bisection and reconnection.

Similar articles

Cited by

References

    1. Bininda-Emonds O.R., Gittleman J.L., Purvis A. 1999. Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biol. Rev. Camb. Philos. Soc. 74(2):143–175. - PubMed
    1. Bininda-Emonds O.R.P., Gittleman J.L., Steel M.A. 2002. The (Super)tree of life: Procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33:265–289.
    1. Bouchenak-Khelladi Y., Salamin N., Savolainen V., Forest F., Bank M.V., Chase M.W., Hodkinson T.R. 2008. Large multi-gene phylogenetic trees of the grasses (Poaceae): Progress towards complete tribal and generic level sampling. Mol. Phylogenet. Evol. 47(2):488–505. - PubMed
    1. Chernomor O., Minh B.Q., von Haeseler A. 2015. Consequences of common topological rearrangements for partition trees in phylogenomic inference. J. Comput. Biol. 22(12):1129–1142. - PMC - PubMed
    1. De Queiroz A., Donoghue M.J., Kim J. 1995. Separate versus combined analysis of phylogenetic evidence. Annu. Rev. Ecol. Syst. 26:657–681.