The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples
- PMID: 22844205
- PMCID: PMC3405733
- DOI: 10.1111/j.1467-9868.2011.01018.x
The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples
Abstract
It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that, if we equate a metagenomic sample with its empirical distribution on a reference phylogenetic tree, then the weighted UniFrac distance between two samples is just the classical Kantorovich-Rubinstein, or earth mover's, distance between the corresponding empirical distributions. We demonstrate that this Kantorovich-Rubinstein distance and extensions incorporating uncertainty in the sample locations can be written as a readily computable integral over the tree, we develop L(p) Zolotarev-type generalizations of the metric, and we show how the p-value of the resulting natural permutation test of the null hypothesis 'no difference between two communities' can be approximated by using a Gaussian process functional. We relate the L(2)-case to an analysis-of-variance type of decomposition, finding that the distribution of its associated Gaussian functional is that of a computable linear combination of independent [Formula: see text] random variables.
Figures







References
-
- Ambrosio L, Gigli N, Savaré G. Gradient Flows in Metric Spaces and in the Space of Probability Measures. 2. Basel: Birkhäuser; 2008.
-
- Baker B, Banfield J. Microbial communities in acid mine drainage. FEMS Microbiol Ecol. 2003;44:139–152. - PubMed
-
- Berger S, Stamatakis A. Evolutionary placement of short sequence reads. Submitted to Syst Biol. 2010 (Available from http://arxiv.org/abs/0911.2852.) - PMC - PubMed
-
- Billera L, Holmes S, Vogtmann K. Geometry of the space of phylogenetic trees. Adv Appl Math. 2001;27:733–767.
Grants and funding
LinkOut - more resources
Full Text Sources