. 2021 Jun 16;70(4):660-680.

doi: 10.1093/sysbio/syab009.

Of Traits and Trees: Probabilistic Distances under Continuous Trait Models for Dissecting the Interplay among Phylogeny, Model, and Data

Richard H Adams¹, Heath Blackmon², Michael DeGiorgio¹

Affiliations

¹ Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA.
² Department of Biology, Texas A&M University, College Station, TX 77843, USA.

PMID: 33587145
PMCID: PMC8208806
DOI: 10.1093/sysbio/syab009

Of Traits and Trees: Probabilistic Distances under Continuous Trait Models for Dissecting the Interplay among Phylogeny, Model, and Data

Richard H Adams et al. Syst Biol. 2021.

. 2021 Jun 16;70(4):660-680.

doi: 10.1093/sysbio/syab009.

Authors

Richard H Adams¹, Heath Blackmon², Michael DeGiorgio¹

Affiliations

¹ Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA.
² Department of Biology, Texas A&M University, College Station, TX 77843, USA.

PMID: 33587145
PMCID: PMC8208806
DOI: 10.1093/sysbio/syab009

Abstract

Stochastic models of character trait evolution have become a cornerstone of evolutionary biology in an array of contexts. While probabilistic models have been used extensively for statistical inference, they have largely been ignored for the purpose of measuring distances between phylogeny-aware models. Recent contributions to the problem of phylogenetic distance computation have highlighted the importance of explicitly considering evolutionary model parameters and their impacts on molecular sequence data when quantifying dissimilarity between trees. By comparing two phylogenies in terms of their induced probability distributions that are functions of many model parameters, these distances can be more informative than traditional approaches that rely strictly on differences in topology or branch lengths alone. Currently, however, these approaches are designed for comparing models of nucleotide substitution and gene tree distributions, and thus, are unable to address other classes of traits and associated models that may be of interest to evolutionary biologists. Here, we expand the principles of probabilistic phylogenetic distances to compute tree distances under models of continuous trait evolution along a phylogeny. By explicitly considering both the degree of relatedness among species and the evolutionary processes that collectively give rise to character traits, these distances provide a foundation for comparing models and their predictions, and for quantifying the impacts of assuming one phylogenetic background over another while studying the evolution of a particular trait. We demonstrate the properties of these approaches using theory, simulations, and several empirical data sets that highlight potential uses of probabilistic distances in many scenarios. We also introduce an open-source R package named PRDATR for easy application by the scientific community for computing phylogenetic distances under models of character trait evolution.[Brownian motion; comparative methods; phylogeny; quantitative traits.].

PubMed Disclaimer

Figures

**Figure 1**
Conceptual schematic depicting an example set of distance computations for a simple phylogenetic model with taxa (top left). Coupled with a particular model (i.e., BM, OU, or EB), this phylogenetic tree model provides a variance–covariance matrix that is scaled by model parameters. In this example, the first model (lower left) represents a standard BM model with , and there are three alternative models possible for : ), ), or ). For each model under this phylogenetic scenario, the probability distribution of trait values sampled at the tips can be formulated as a bivariate (i.e., ) normal distribution, which is depicted by each respective model as a heatmap overlaid by a contour plot, with darker colors representing higher probabilities. Distances are computed by comparing these bivariate normal distributions with one another (arrows from to each indicate pairs of model distances to be computed).

formula image — **Figure 1**
Conceptual schematic depicting an example set of distance computations for a simple phylogenetic model with taxa (top left). Coupled with a particular model (i.e., BM, OU, or EB), this phylogenetic tree model provides a variance–covariance matrix that is scaled by model parameters. In this example, the first model (lower left) represents a standard BM model with , and there are three alternative models possible for : ), ), or ). For each model under this phylogenetic scenario, the probability distribution of trait values sampled at the tips can be formulated as a bivariate (i.e., ) normal distribution, which is depicted by each respective model as a heatmap overlaid by a contour plot, with darker colors representing higher probabilities. Distances are computed by comparing these bivariate normal distributions with one another (arrows from to each indicate pairs of model distances to be computed).

**Figure 2**
Probabilistic phylogenetic distances under models of discrete trait evolution computed across a range of scaling values. a) Symmetric topology phylogenetic tree with taxa that continuous trait evolutionary models are condition on. b) Hellinger distances () computed using the tree in (a) for BM, OU, EB, L, K, and D continuous trait models, with the first model representing a standard BM model with , and the respective parameters of the second model scaled by . See Table 1 for description of each model and scaled parameters. c) Hellinger distances computed using the tree in (a) for BM, OU, EB, L, K and D models, with the first model representing a standard BM model with , and the respective parameters of the second model scaled by .

**Figure 3**
Hellinger distance () and Kullback–Leibler divergence () between a pair of hybridization networks (a) shown in (b), or between a bifurcating tree and a hybridization network (c) shown in (d), which were computed across a range of values for either the evolutionary rate parameter (i.e., ) or the migration proportion (i.e., ) for two BM models.

**Figure 4**
The synergistic influence of tree shape, taxa number, and evolutionary model parameter on probabilistic distances. Results shown for the Hellinger distance () computed between a BM model and either the OU (a–c), EB (d–f), or D (g–i) model for simulations using different numbers of taxa on three different tree shapes: “balanced” (left column), “left unbalanced” (center column), and “star” (right column). Branch lengths are chosen such that the total tree height is scaled to 1.0. For each plot, the particular parameter values are indicated with arrows pointing to the specific lines, such that each line represents a different parameter value on a log-scale from 0.01 to 10.0.

**Figure 5**
Investigating the relationship between model distances and the significance of likelihood ratio tests between fitted BM and OU models (traits simulated under an OU model). Results shown for three different tree shapes: “balanced” (left panels), “left unbalanced” (center), and trees simulated under a Yule model with the birth rate (right) with equal branch lengths that are scaled to give a total tree height of 1.0. values for a likelihood ratio test comparing the OU and BM models as a function of their Hellinger distance () are shown for three different tree sizes: 128 (a–c), 512 (d–f), and 1024 (g–i) tips. The mean (circle) and standard deviations (bars) of the distribution of 10 replicate values (subtracted from one). Each simulation replicate was computed by incrementally increasing the parameter of the OU model from to (from left to right in each panel colored in the blue scale shown), at increments of 0.01.

**Figure 6**
Computing probabilistic Hellinger distances () between the BM, OU, EB, L, K, and D continuous trait models that were fit to the amphibian genome size data set of taxa. Graphical network showing the six models (BM, OU, EB, L, K, and D) as nodes connected by edges, with the widths of edges scaled by their respective probabilistic distances (shown beside each edge).

**Figure 7**
Multidimensional scaling (MDS) based on pairwise Hellinger distances () estimated assuming a BM model (a) or an OU model (c) for a set of 6144 avian gene trees that comprise 2136 exons (dark gray), 329 introns (black), and 3679 UCEs (light gray). Analogous, plots (b) and (d) depict pairwise distances projected using MDS for the 31 avian species trees assuming BM (b) or OU (d) models, respectively.

**Figure 8**
Applying the Hellinger distance () to multivariate models of continuous trait evolution. Results for simulation analyses using the phylogeny depicted in (a) are shown in (b), where is the covariance between the two traits. Phylogeny depicting “Felsenstein’s worst case” scenario is shown in (c), which was used to simulate data sets in which an instantaneous shift occurs on one of the ancestral branches (location of shift depicted as a tick mark on the tree in (c)), and results shown in (d) with log ratio of the shift to BM variance (-axis) and the Hellinger distance (-axis) computed between the unconstrained and constrained models that have been fit to the simulated data. Color of points in (b) and (c) indicate 1- value of the likelihood ratio test between an unconstrained model (i.e., is estimated) and a constrained model (i.e., such that traits are assumed to be independent) that have been fit to the data.

**Figure 9**
Investigating identifiability of mixed OU models using the Hellinger distance (). Asterisks (*) indicate the location of shift points for OU model parameters in the tree pairs shown in (a), (c), and (e). Heatmap shown in (b) represents the Hellinger distance computed between the left and right tree models displayed in (a) across a range of values for the ancestral state and the background optimum using a shift optimum of the right tree, while using , , and for the left tree in (a). d) The distance between the two tree models shown in (c) across a range of and parameter values of the OU model marked with a gray asterisk in the left tree of (c). Similarly, results for the Hellinger distance between the two tree models displayed in (e) are shown in (f), with a range of and parameter values for the OU model represented by a gray asterisk in the right tree of (e).

See this image and copyright information in PMC

Cited by

A Tale of Too Many Trees: A Conundrum for Phylogenetic Regression.
Adams R, Lozano JR, Duncan M, Green J, Assis R, DeGiorgio M. Adams R, et al. Mol Biol Evol. 2025 Mar 5;42(3):msaf032. doi: 10.1093/molbev/msaf032. Mol Biol Evol. 2025. PMID: 39930867 Free PMC article.
TraitTrainR: accelerating large-scale simulation under models of continuous trait evolution.
Roa Lozano J, Duncan M, McKenna DD, Castoe TA, DeGiorgio M, Adams R. Roa Lozano J, et al. Bioinform Adv. 2024 Dec 9;5(1):vbae196. doi: 10.1093/bioadv/vbae196. eCollection 2025. Bioinform Adv. 2024. PMID: 39758830 Free PMC article.
Piikun: an information theoretic toolkit for analysis and visualization of species delimitation metric space.
Sukumaran J, Meila M. Sukumaran J, et al. BMC Bioinformatics. 2024 Dec 18;25(1):385. doi: 10.1186/s12859-024-05997-y. BMC Bioinformatics. 2024. PMID: 39695946 Free PMC article.
New generalized metric based on branch length distance to compare B cell lineage trees.
Farnia M, Tahiri N. Farnia M, et al. Algorithms Mol Biol. 2024 Oct 5;19(1):22. doi: 10.1186/s13015-024-00267-1. Algorithms Mol Biol. 2024. PMID: 39369262 Free PMC article.
Discriminating models of trait evolution.
Lozano JR, DeGiorgio M, Assis R, Adams R. Lozano JR, et al. bioRxiv [Preprint]. 2025 Jun 13:2025.06.12.659377. doi: 10.1101/2025.06.12.659377. bioRxiv. 2025. PMID: 40661575 Free PMC article. Preprint.

References

1. Abou-Moustafa K.T., Ferrie F.P.. 2012. A note on metric properties for some divergence measures: the Gaussian case. J. Mach. Learn. Res. 15:1–15.
1. Adams R.H., Castoe T.A.. 2019a. Statistical binning leads to profound model violation due to gene tree error incurred by trying to avoid gene tree error. Mol. Phylogenet. Evol. 134:164–171. - PubMed
1. Adams R.H., Castoe T.A.. 2019b. Probabilistic species tree distances: implementing the multispecies coalescent to compare species trees within the same model-based framework used to estimate them. Syst. Biol. 61:194–207. - PubMed
1. Akaike H. 1973. Information theory and an extension of the maximum likelihood principle. 2nd International Symposium on Information Theory. Budapest: Akademiai Kiado. p. 267–281.
1. Aldous D.J. 1995. Probability distributions on cladograms. In: Aldous D.J., Pemantle R., editors. Random discrete structures. Berlin: Springer. p. 1–18.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Associated data

Dryad/10.5061/dryad.m0cfxpp36

Grants and funding

R35 GM128590/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- Dryad Digital Repository - Access Curated Datasets
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Of Traits and Trees: Probabilistic Distances under Continuous Trait Models for Dissecting the Interplay among Phylogeny, Model, and Data

Affiliations

Of Traits and Trees: Probabilistic Distances under Continuous Trait Models for Dissecting the Interplay among Phylogeny, Model, and Data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources