Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul;65(4):628-39.
doi: 10.1093/sysbio/syw019. Epub 2016 Mar 11.

Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

Affiliations

Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

Tanja Stadler et al. Syst Biol. 2016 Jul.

Abstract

Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth-death and multispecies coalescent model can explain the difference in empirical trees and birth-death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion.

Keywords: Birth–death process; genealogy; multispecies coalescent; phylogeny.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.
Figure 1.
Mean Colless statistic of gene trees divided by mean Colless statistic of species trees (C¯g/C¯s). Solid lines correspond to complete species sampling ρ=1, dashed lines to sampling probability ρ=0.75, and dot-dashed lines to sampling probability ρ=0.5. Plots are obtained based on 100,000 simulated species tree–gene tree pairs at each choice of parameter values, taking means separately for the gene trees and the species trees.
F<sc>igure</sc> 2.
Figure 2.
Mean γ statistic of gene trees minus mean γ statistic of species trees (γ¯gγ¯s). Solid lines correspond to complete species sampling ρ=1, dashed lines to sampling probability ρ=0.75, and dot-dashed lines to sampling probability ρ=0.5. Plots are obtained based on 100,000 simulated species tree–gene tree pairs at each choice of parameter values, taking means separately for the gene trees and the species trees.
F<sc>igure</sc> 3.
Figure 3.
Distributions of 1+(CgCs)/C¯s and Cg/Cs and the joint distribution of Cg and Cs. All plots are for the birth process only with no extinction and are based on 10,000 independent gene tree–species tree pairs simulated in Hybrid-Lambda (Zhu et al. 2015). Gray lines in the scatterplots represent the line Cg=Cs; above the line, based on the Colless statistic, the gene tree has less balance than the species tree.
F<sc>igure</sc> 4.
Figure 4.
The Colless statistic for empirical trees from TreeBASE. Each black dot represents a tree. We normalized each empirical Colless value by dividing it by the expected species tree Colless value. The expected species tree Colless value is independent of speciation rate λ, turnover μ/λ, and species sampling ρ. The red line represents the mean of the normalized Colless statistic for each fixed tree size.

Similar articles

Cited by

References

    1. Agapow P.M., Purvis A. 2002. Power of eight tree shape statistics to detect nonrandom diversification: a comparison by simulation of two models of cladogenesis. Syst. Biol. 51: 866–872. - PubMed
    1. Aldous D., Pemantle R., editors. 1996. Random discrete structures, vol. 76 of The IMA volumes in mathematics and its applications. Springer, New York; p. 1–18.
    1. Aldous D., Popovic L. 2005. A critical branching process model for biodiversity. Adv. Appl. Prob. 37: 1094–1115.
    1. Aldous D.J. 2001. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 16: 23–34.
    1. Blum M.G.B., François O. 2006. Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Syst. Biol. 55: 685–691. - PubMed