Phylogenetic stochastic mapping without matrix exponentiation

Jan Irvahn¹, Vladimir N Minin

Affiliations

PMID: 24918812
PMCID: PMC4148059
DOI: 10.1089/cmb.2014.0062

Phylogenetic stochastic mapping without matrix exponentiation

Jan Irvahn et al. J Comput Biol. 2014 Sep.

. 2014 Sep;21(9):676-90.

doi: 10.1089/cmb.2014.0062. Epub 2014 Jun 11.

Authors

Jan Irvahn¹, Vladimir N Minin

Affiliation

¹ 1 Department of Statistics, University of Washington , Seattle, Washington.

PMID: 24918812
PMCID: PMC4148059
DOI: 10.1089/cmb.2014.0062

Abstract

Phylogenetic stochastic mapping is a method for reconstructing the history of trait changes on a phylogenetic tree relating species/organism carrying the trait. State-of-the-art methods assume that the trait evolves according to a continuous-time Markov chain (CTMC) and works well for small state spaces. The computations slow down considerably for larger state spaces (e.g., space of codons), because current methodology relies on exponentiating CTMC infinitesimal rate matrices-an operation whose computational complexity grows as the size of the CTMC state space cubed. In this work, we introduce a new approach, based on a CTMC technique called uniformization, which does not use matrix exponentiation for phylogenetic stochastic mapping. Our method is based on a new Markov chain Monte Carlo (MCMC) algorithm that targets the distribution of trait histories conditional on the trait data observed at the tips of the tree. The computational complexity of our MCMC method grows as the size of the CTMC state space squared. Moreover, in contrast to competing matrix exponentiation methods, if the rate matrix is sparse, we can leverage this sparsity and increase the computational efficiency of our algorithm further. Using simulated data, we illustrate advantages of our MCMC algorithm and investigate how large the state space needs to be for our method to outperform matrix exponentiation approaches. We show that even on the moderately large state space of codons our MCMC method can be significantly faster than currently used matrix exponentiation methods.

Keywords: MCMC; codon models; data augmentation; evolution; uniformization.

PubMed Disclaimer

Figures

<b>FIG. 1.</b> — **FIG. 1.**
An example of applying the two Markov kernels of our MCMC sampler to an augmented substitution history. The diamonds represent virtual transitions. **(A)** shows an initial augmented substitution history; **(B)** shows the substitution history after resampling states on the phylogeny conditional on tip node states and the transition points (both real and virtual); **(C)** shows the substitution history seen in **(B)** with no virtual jumps; and **(D)** shows the augmented substitution history after resampling virtual jumps conditional on the substitution history seen in **(C)**. The transition from **(A)** to **(B)** shows the effect of the first Markov kernel, sampling from . The transition from **(C)** to **(D)** shows the effect of the second Markov kernel, sampling from .

formula image — **FIG. 1.**
An example of applying the two Markov kernels of our MCMC sampler to an augmented substitution history. The diamonds represent virtual transitions. **(A)** shows an initial augmented substitution history; **(B)** shows the substitution history after resampling states on the phylogeny conditional on tip node states and the transition points (both real and virtual); **(C)** shows the substitution history seen in **(B)** with no virtual jumps; and **(D)** shows the augmented substitution history after resampling virtual jumps conditional on the substitution history seen in **(C)**. The transition from **(A)** to **(B)** shows the effect of the first Markov kernel, sampling from . The transition from **(C)** to **(D)** shows the effect of the second Markov kernel, sampling from .

<b>FIG. 2.</b> — **FIG. 2.**
State space effect. All four plots show the amount of time required to obtain 10,000 effective samples as a function of the size of the state space for two methods, matrix exponentiation in purple squares and our MCMC sampler in black circles. The two plots in the top row show results for a randomly generated tree with 50 tips. The two plots in the bottom row show results for a randomly generated tree with 100 tips. The two plots in the left column show results for a rate matrix that was scaled to produce two expected transitions while the two plots in the right column show results for a rate matrix that was scaled to produce six expected transitions.

<b>FIG. 3.</b> — **FIG. 3.**
Time to obtain 10,000 effective samples as a function of the dominating Poisson process rate, Ω. All four plots show results of our MCMC sampler in black. Timing results for the matrix exponentiation method are represented by a purple horizontal line because the matrix exponentiation result does not vary as a function of Ω. The two plots in the top row show results for a randomly generated tree with 50 tips. The two plots in the bottom row show results for a randomly generated tree with 100 tips. The rate matrix for the plots in the left column had four states. The rate matrix for the plots in the right column had 60 states.

<b>FIG. 4.</b> — **FIG. 4.**
Time to obtain 10,000 effective samples as a function of the size of the state space. All four plots show results for three different implementations, our MCMC sampler in black, a sparse version of our MCMC sampler in red, and a matrix exponentiation approach that only exponentiates the rate matrix once per branch in blue. The rate matrix is tridiagonal and scaled to produce two expected transitions per tree (in the left column) or six expected transitions per tree (in the right column). The two plots in the top row show results for a randomly generated tree with 50 tips. The two plots in the bottom row show results for a randomly generated tree with 100 tips. The dominating Poisson process rate, Ω, is 0.2.

<b>FIG. 5.</b> — **FIG. 5.**
Time to obtain 10,000 effective samples as a function of the dominating Poisson process rate, Ω, for the GY94 codon rate matrix. All four plots show results for three different implementations: our MCMC sampler in black, a sparse version of our MCMC sampler in red, and a matrix exponentiation approach in purple. The GY94 rate matrix was scaled to produce 2 expected transitions per tree (in the left column) or 6 expected transitions per tree (in the right column). The two plots in the top row show results for a randomly generated tree with 50 tips. The two plots in the bottom row show results for a randomly generated tree with 100 tips.

See this image and copyright information in PMC

Cited by

Inferring Long-Term Effective Population Size with Mutation-Selection Models.
Latrille T, Lanore V, Lartillot N. Latrille T, et al. Mol Biol Evol. 2021 Sep 27;38(10):4573-4587. doi: 10.1093/molbev/msab160. Mol Biol Evol. 2021. PMID: 34191010 Free PMC article.
chromploid: An R package for chromosome number evolution across the plant tree of life.
Zenil-Ferguson R, Burleigh JG, Ponciano JM. Zenil-Ferguson R, et al. Appl Plant Sci. 2018 Apr 11;6(3):e1037. doi: 10.1002/aps3.1037. eCollection 2018 Mar. Appl Plant Sci. 2018. PMID: 29732267 Free PMC article.

References

1. Bollback J.P.2006. SIMMAP: stochastic character mapping of discrete traits on phylogenies. BMC Bioinformatics 7, 88. - PMC - PubMed
1. de Koning A.J., Gu W., and Pollock D.D.2010. Rapid likelihood analysis on large phylogenies using partial sampling of substitution histories. Mol. Biol. Evol. 27, 249–265 - PMC - PubMed
1. Drummond A.J., Suchard M.A., Xie D., and Rambaut A.2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 - PMC - PubMed
1. Eddelbuettel D., and François R.2011. Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40, 1–18
1. Eddelbuettel D., and Sanderson C.2014. RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Computational Statistics & Data Analysis 71, 1054–1063

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Phylogenetic stochastic mapping without matrix exponentiation

Affiliation

Phylogenetic stochastic mapping without matrix exponentiation

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources