Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 16;70(4):660-680.
doi: 10.1093/sysbio/syab009.

Of Traits and Trees: Probabilistic Distances under Continuous Trait Models for Dissecting the Interplay among Phylogeny, Model, and Data

Affiliations

Of Traits and Trees: Probabilistic Distances under Continuous Trait Models for Dissecting the Interplay among Phylogeny, Model, and Data

Richard H Adams et al. Syst Biol. .

Abstract

Stochastic models of character trait evolution have become a cornerstone of evolutionary biology in an array of contexts. While probabilistic models have been used extensively for statistical inference, they have largely been ignored for the purpose of measuring distances between phylogeny-aware models. Recent contributions to the problem of phylogenetic distance computation have highlighted the importance of explicitly considering evolutionary model parameters and their impacts on molecular sequence data when quantifying dissimilarity between trees. By comparing two phylogenies in terms of their induced probability distributions that are functions of many model parameters, these distances can be more informative than traditional approaches that rely strictly on differences in topology or branch lengths alone. Currently, however, these approaches are designed for comparing models of nucleotide substitution and gene tree distributions, and thus, are unable to address other classes of traits and associated models that may be of interest to evolutionary biologists. Here, we expand the principles of probabilistic phylogenetic distances to compute tree distances under models of continuous trait evolution along a phylogeny. By explicitly considering both the degree of relatedness among species and the evolutionary processes that collectively give rise to character traits, these distances provide a foundation for comparing models and their predictions, and for quantifying the impacts of assuming one phylogenetic background over another while studying the evolution of a particular trait. We demonstrate the properties of these approaches using theory, simulations, and several empirical data sets that highlight potential uses of probabilistic distances in many scenarios. We also introduce an open-source R package named PRDATR for easy application by the scientific community for computing phylogenetic distances under models of character trait evolution.[Brownian motion; comparative methods; phylogeny; quantitative traits.].

PubMed Disclaimer

Figures

Figure 1
Figure 1
Conceptual schematic depicting an example set of distance computations for a simple phylogenetic model with formula image taxa (top left). Coupled with a particular model (i.e., BM, OU, or EB), this phylogenetic tree model provides a variance–covariance matrix that is scaled by model parameters. In this example, the first model formula image (lower left) represents a standard BM model with formula image, and there are three alternative models possible for formula image: formula image), formula image), or formula image). For each model under this phylogenetic scenario, the probability distribution of trait values formula image sampled at the tips can be formulated as a bivariate (i.e., formula image) normal distribution, which is depicted by each respective model as a heatmap overlaid by a contour plot, with darker colors representing higher probabilities. Distances are computed by comparing these bivariate normal distributions with one another (arrows from formula image to each formula image indicate pairs of model distances to be computed).
Figure 2
Figure 2
Probabilistic phylogenetic distances under models of discrete trait evolution computed across a range of scaling values. a) Symmetric topology phylogenetic tree with formula image taxa that continuous trait evolutionary models are condition on. b) Hellinger distances (formula image) computed using the tree in (a) for BM, OU, EB, L, K, and D continuous trait models, with the first model representing a standard BM model with formula image, and the respective parameters of the second model scaled by formula image. See Table 1 for description of each model and scaled parameters. c) Hellinger distances computed using the tree in (a) for BM, OU, EB, L, K and D models, with the first model representing a standard BM model with formula image, and the respective parameters of the second model scaled by formula image.
Figure 3
Figure 3
Hellinger distance (formula image) and Kullback–Leibler divergence (formula image) between a pair of hybridization networks (a) shown in (b), or between a bifurcating tree and a hybridization network (c) shown in (d), which were computed across a range of values for either the evolutionary rate parameter formula image (i.e., formula image) or the migration proportion formula image (i.e., formula image) for two BM models.
Figure 4
Figure 4
The synergistic influence of tree shape, taxa number, and evolutionary model parameter on probabilistic distances. Results shown for the Hellinger distance (formula image) computed between a BM model and either the OU (a–c), EB (d–f), or D (g–i) model for simulations using different numbers of taxa on three different tree shapes: “balanced” (left column), “left unbalanced” (center column), and “star” (right column). Branch lengths are chosen such that the total tree height is scaled to 1.0. For each plot, the particular parameter values are indicated with arrows pointing to the specific lines, such that each line represents a different parameter value on a log-scale from 0.01 to 10.0.
Figure 5
Figure 5
Investigating the relationship between model distances and the significance of likelihood ratio tests between fitted BM and OU models (traits simulated under an OU model). Results shown for three different tree shapes: “balanced” (left panels), “left unbalanced” (center), and trees simulated under a Yule model with the birth rate formula image (right) with equal branch lengths that are scaled to give a total tree height of 1.0. formula image values for a likelihood ratio test comparing the OU and BM models as a function of their Hellinger distance (formula image) are shown for three different tree sizes: 128 (a–c), 512 (d–f), and 1024 (g–i) tips. The mean (circle) and standard deviations (bars) of the distribution of 10 replicate formula image values (subtracted from one). Each simulation replicate was computed by incrementally increasing the formula image parameter of the OU model from formula image to formula image (from left to right in each panel colored in the blue scale shown), at increments of 0.01.
Figure 6
Figure 6
Computing probabilistic Hellinger distances (formula image) between the BM, OU, EB, L, K, and D continuous trait models that were fit to the amphibian genome size data set of formula image taxa. Graphical network showing the six models (BM, OU, EB, L, K, and D) as nodes connected by edges, with the widths of edges scaled by their respective probabilistic distances (shown beside each edge).
Figure 7
Figure 7
Multidimensional scaling (MDS) based on pairwise Hellinger distances (formula image) estimated assuming a BM model (a) or an OU model (c) for a set of 6144 avian gene trees that comprise 2136 exons (dark gray), 329 introns (black), and 3679 UCEs (light gray). Analogous, plots (b) and (d) depict pairwise distances projected using MDS for the 31 avian species trees assuming BM (b) or OU (d) models, respectively.
Figure 8
Figure 8
Applying the Hellinger distance (formula image) to multivariate models of continuous trait evolution. Results for simulation analyses using the phylogeny depicted in (a) are shown in (b), where formula image is the covariance between the two traits. Phylogeny depicting “Felsenstein’s worst case” scenario is shown in (c), which was used to simulate data sets in which an instantaneous shift occurs on one of the ancestral branches (location of shift depicted as a tick mark on the tree in (c)), and results shown in (d) with logformula image ratio of the shift to BM variance (formula image-axis) and the Hellinger distance (formula image-axis) computed between the unconstrained and constrained models that have been fit to the simulated data. Color of points in (b) and (c) indicate 1-formula image value of the likelihood ratio test between an unconstrained model (i.e., formula image is estimated) and a constrained model (i.e., formula image such that traits are assumed to be independent) that have been fit to the data.
Figure 9
Figure 9
Investigating identifiability of mixed OU models using the Hellinger distance (formula image). Asterisks (*) indicate the location of shift points for OU model parameters in the tree pairs shown in (a), (c), and (e). Heatmap shown in (b) represents the Hellinger distance computed between the left and right tree models displayed in (a) across a range of values for the ancestral state formula image and the background optimum formula image using a shift optimum formula image of the right tree, while using formula image, formula image, and formula image for the left tree in (a). d) The distance between the two tree models shown in (c) across a range of formula image and formula image parameter values of the OU model marked with a gray asterisk in the left tree of (c). Similarly, results for the Hellinger distance between the two tree models displayed in (e) are shown in (f), with a range of formula image and formula image parameter values for the OU model represented by a gray asterisk in the right tree of (e).

Similar articles

Cited by

References

    1. Abou-Moustafa K.T., Ferrie F.P.. 2012. A note on metric properties for some divergence measures: the Gaussian case. J. Mach. Learn. Res. 15:1–15.
    1. Adams R.H., Castoe T.A.. 2019a. Statistical binning leads to profound model violation due to gene tree error incurred by trying to avoid gene tree error. Mol. Phylogenet. Evol. 134:164–171. - PubMed
    1. Adams R.H., Castoe T.A.. 2019b. Probabilistic species tree distances: implementing the multispecies coalescent to compare species trees within the same model-based framework used to estimate them. Syst. Biol. 61:194–207. - PubMed
    1. Akaike H. 1973. Information theory and an extension of the maximum likelihood principle. 2nd International Symposium on Information Theory. Budapest: Akademiai Kiado. p. 267–281.
    1. Aldous D.J. 1995. Probability distributions on cladograms. In: Aldous D.J., Pemantle R., editors. Random discrete structures. Berlin: Springer. p. 1–18.

Publication types