Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 1;74(3):569-592.
doi: 10.1111/j.1467-9868.2011.01018.x. Epub 2012 Feb 15.

The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples

Affiliations

The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples

Steven N Evans et al. J R Stat Soc Series B Stat Methodol. .

Abstract

It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that, if we equate a metagenomic sample with its empirical distribution on a reference phylogenetic tree, then the weighted UniFrac distance between two samples is just the classical Kantorovich-Rubinstein, or earth mover's, distance between the corresponding empirical distributions. We demonstrate that this Kantorovich-Rubinstein distance and extensions incorporating uncertainty in the sample locations can be written as a readily computable integral over the tree, we develop L(p) Zolotarev-type generalizations of the metric, and we show how the p-value of the resulting natural permutation test of the null hypothesis 'no difference between two communities' can be approximated by using a Gaussian process functional. We relate the L(2)-case to an analysis-of-variance type of decomposition, finding that the distribution of its associated Gaussian functional is that of a computable linear combination of independent [Formula: see text] random variables.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Tree with branches thickened as a linear function of the number of placements in the control sample placed on that branch
Fig. 2
Fig. 2
Tree as in Fig. 1, but for the DMSP-treated sample
Fig. 3
Fig. 3
Comparison of the distribution of (a) Z1- and (b) Z2-distances obtained by shuffling (———), Gaussian approximation (– – –) and the observed value (×) for the example data set
Fig. 4
Fig. 4
Dendrogram with barycenters marked: ●, control sample; ★, sample treated with DMSP
Fig. 5
Fig. 5
Plot showing sample (○) and randomized ranges ( formula image): outliers have been eliminated for clarity; for each p, the distribution was rescaled by subtracting the mean and dividing by the standard deviation
Fig. 6
Fig. 6
Tree displaying the optimal movement of mass for the KR metric: when moving from the first probability distribution to the second, branches marked in gray have mass moving towards the root, whereas those marked in black have mass moving towards the leaves; thickness shows the quantity of mass moving through that branch

References

    1. Ambrosio L, Gigli N, Savaré G. Gradient Flows in Metric Spaces and in the Space of Probability Measures. 2. Basel: Birkhäuser; 2008.
    1. Baker B, Banfield J. Microbial communities in acid mine drainage. FEMS Microbiol Ecol. 2003;44:139–152. - PubMed
    1. Berger S, Stamatakis A. Evolutionary placement of short sequence reads. Submitted to Syst Biol. 2010 (Available from http://arxiv.org/abs/0911.2852.) - PMC - PubMed
    1. Bik E, Eckburg P, Gill S, Nelson K, Purdom E, Francois F, Perez-Perez G, Blaser M, Relman D. Molecular analysis of the bacterial microbiota in the human stomach. Proc Natn Acad Sci USA. 2006;103:732. - PMC - PubMed
    1. Billera L, Holmes S, Vogtmann K. Geometry of the space of phylogenetic trees. Adv Appl Math. 2001;27:733–767.

LinkOut - more resources