Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan;7(1):173-83.
doi: 10.1038/ismej.2012.88. Epub 2012 Aug 2.

Measures of phylogenetic differentiation provide robust and complementary insights into microbial communities

Affiliations

Measures of phylogenetic differentiation provide robust and complementary insights into microbial communities

Donovan H Parks et al. ISME J. 2013 Jan.

Abstract

High-throughput sequencing techniques have made large-scale spatial and temporal surveys of microbial communities routine. Gaining insight into microbial diversity requires methods for effectively analyzing and visualizing these extensive data sets. Phylogenetic β-diversity measures address this challenge by allowing the relationship between large numbers of environmental samples to be explored using standard multivariate analysis techniques. Despite the success and widespread use of phylogenetic β-diversity measures, an extensive comparative analysis of these measures has not been performed. Here, we compare 39 measures of phylogenetic β diversity in order to establish the relative similarity of these measures along with key properties and performance characteristics. While many measures are highly correlated, those commonly used within microbial ecology were found to be distinct from those popular within classical ecology, and from the recently recommended Gower and Canberra measures. Many of the measures are surprisingly robust to different rootings of the gene tree, the choice of similarity threshold used to define operational taxonomic units, and the presence of outlying basal lineages. Measures differ considerably in their sensitivity to rare organisms, and the effectiveness of measures can vary substantially under alternative models of differentiation. Consequently, the depth of sequencing required to reveal underlying patterns of relationships between environmental samples depends on the selected measure. Our results demonstrate that using complementary measures of phylogenetic β diversity can further our understanding of how communities are phylogenetically differentiated. Open-source software implementing the phylogenetic β-diversity measures evaluated in this manuscript is available at http://kiwi.cs.dal.ca/Software/ExpressBetaDiversity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Similarity of phylogenetic β-diversity measures. Branch lengths are transformed Pearson's r values, d=r−1, averaged over 100 random subsets of 10 samples drawn from each of the four empirical data sets. The hierarchical relationship between measures was obtained using the UPGMA clustering algorithm. Branches supported by at least 70% of the trials are indicated with asterisks. The five most highly correlated and consistently clustered groups of measures are highlighted in different colors. These clusterings are nearly perfectly recovered on all four data sets (Supplementary Figure S2). Phylogenetic measures commonly used within microbial ecology are shown in bold while measures popular in classical ecology are underlined. Measures are specified by their common quantitative name and qualitative counterparts indicated by prefixing a ‘u' (for ‘unweighted'). Each measure is classified as a MRCA, CL, or CT measure.
Figure 2
Figure 2
Influence of sequence clustering on phylogenetic β-diversity measures. (a, c) Mean correlation across all quantitative (a) and qualitative (c) measures on 100 randomly selected subsets of 10 samples from each empirical data set. (b, d) Correlation of select quantitative (b) and qualitative (d) measures averaged over all four empirical data sets. (e, f) Ordination plots obtained by applying the qualitative Soergel measure to the keyboard data set with sequences clustered at 100% (e) and 85% (f) sequence similarity. (g, h) Ordination plots for the qualitative MPD measure with sequences clustered at 100% (g) and 85% (h) sequence similarity. Principal coordinate analysis was used to generate the ordination plots. The percentage of total variance explained by each axis is shown in parentheses. Each data point represents a sample taken from one of three individuals. Pearson's correlation coefficient, r, between dissimilarity values measured before and after clustering is given in the bottom-left corner of each plot.
Figure 3
Figure 3
Recovery of clusters is influenced by a measure's robustness to outlying basal lineages. The quantitative Bray-Curtis (ac), qualitative Soergel (df), and qualitative Pearson dissimilarity (gi) measures were applied to the human data set. (a, d, g) All three methods revealed three clusters: a stool cluster, an oral cluster and a mixed navel and hair cluster. The addition of an outlying basal lineage to half the samples did not substantially affect the Bray-Curtis (b: 5% of sequences assigned to the outlying lineage) or uSoergel (e) measures, but obscured the underlying biological clusters for the uPearson dissimilarity (h) measure. For qualitative measures, a single sequence is sufficient to include the outlying lineage and the addition of further sequences does not influence these measures. Each data point in the scatter plots (c, f, i) indicates the dissimilarity measured between a pair of samples before (x-axis) and after (y-axis) adding sequences to the outlying lineage. For all measures, the addition of the outlying lineage caused pairs of samples where both contained the outlying lineage to become more similar (outlier–outlier) and pairs of samples where only one sample contained the outlying lineage to become less similar (outlier–original). Pairs of samples where neither contained the outlying lineage were unaffected (original–original). However, the degree to which the outlier–outlier and outlier–original pairs were affected depended on the measure used. The Pearson's correlation coefficient, r, between dissimilarity values measured before and after addition of the outlying lineage is given in the upper-left corner of each scatter plot.
Figure 4
Figure 4
Effectiveness of measures depends on the mechanism of phylogenetic differentiation and sequencing depth. The Bray-Curtis (ac) and Canberra (df) measures were applied to clusters obtained under the equal-perturbation model at sequencing depths of 100, 1000 or 10 000 sequences per sample. These measures were also applied to clusters generated under the dominant-pair model (gl). The k-medoids score (KMS) is given in the upper-right corner of each ordination plot.

References

    1. Barr JJ, Slater FR, Fukushima T, Bond PL. Evidence for bacteriophage activity causing community and performance changes in a phosphorus-removal activated sludge. FEMS Microbiol Ecol. 2010;74:631–642. - PubMed
    1. Bryant JA, Lamanna C, Morlon H, Kerkhoff AJ, Enquist BJ, Green JL. Microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity. Proc Natl Acad Sci USA. 2008;105:11505–11511. - PMC - PubMed
    1. Caporaso JG, Lauber CL, Costello EK, Berg-Lyons D, Gonzalez A, Stombaugh J, et al. Moving pictures of the human microbiome. Genome Biol. 2011;12:R50. - PMC - PubMed
    1. Clarke KR, Warwick RM. A taxonomic distinctness index and its statistical properties. J Appl Ecol. 1998;35:523–531.
    1. Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. - PMC - PubMed

Publication types