Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(11):e48996.
doi: 10.1371/journal.pone.0048996. Epub 2012 Nov 9.

Statistical object data analysis of taxonomic trees from human microbiome data

Affiliations

Statistical object data analysis of taxonomic trees from human microbiome data

Patricio S La Rosa et al. PLoS One. 2012.

Abstract

Human microbiome research characterizes the microbial content of samples from human habitats to learn how interactions between bacteria and their host might impact human health. In this work a novel parametric statistical inference method based on object-oriented data analysis (OODA) for analyzing HMP data is proposed. OODA is an emerging area of statistical inference where the goal is to apply statistical methods to objects such as functions, images, and graphs or trees. The data objects that pertain to this work are taxonomic trees of bacteria built from analysis of 16S rRNA gene sequences (e.g. using RDP); there is one such object for each biological sample analyzed. Our goal is to model and formally compare a set of trees. The contribution of our work is threefold: first, a weighted tree structure to analyze RDP data is introduced; second, using a probability measure to model a set of taxonomic trees, we introduce an approximate MLE procedure for estimating model parameters and we derive LRT statistics for comparing the distributions of two metagenomic populations; and third the Jumpstart HMP data is analyzed using the proposed model providing novel insights and future directions of analysis.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Example of a bacterial taxonomic tree build from adding three RDP classifications of sequences as shown in Table 1 .
Figure 2
Figure 2. MDS plot showing the distribution of the taxonomic trees corresponding to stool samples sequenced at region V3–V5.
The MLE tree of all samples is denoted by MLE (dot in black) in the MDS plot. Individual taxonomic trees are denoted by formula image with formula image = {2, 3, 5, 7, 16, 18} and these are shown around the MDS plot to illustrate how the tree structure varies. The tree branches are color-coded to represent their weight values (sum of confidence) according to the reference table at the bottom left side of the plot. Blue denote the branches with the highest confidence among all while red denote the branches with lowest confidence. Note here that the tip of each branch represents a genus, and the location of each genus is the same on all trees.
Figure 3
Figure 3. Illustration of the MLE tree for stool samples, region V3–V5.
Sample individual taxonomic trees shown in Figure 2 (formula image with formula image = {2, 3, 5, 7, 16, 18}) are displayed around the MLE tree to illustrate some of tree structures represented by the MLE tree.
Figure 4
Figure 4. Analysis of stool samples for 24 subjects sequenced at variable regions V1–V3 and V3–V5, mapped to the RDP database.
In Figure (a), a pairwise distance matrix was generated using Euclidean distance, and multidimensional scaling was used to display the distribution of these 48 trees showing V1–V3 (red) and V3–V5 (blue) samples are overlapping; In Figure (b), the MLE tree for the 48 trees is illustrated; and in Figures (c) and (d), the MLE tree for trees corresponding to V1–V3 and V3–V5 regions are shown, respectively.
Figure 5
Figure 5. Analysis of saliva and stool samples for 24 subjects sequenced at variable regions V3–V5, mapped to the RDP database.
In Figure (a), a pairwise distance matrix was generated using Euclidean distance and multidimensional scaling was used to display the distribution of these 48 trees showing stool (red) and saliva (red) samples do not overlap; In Figure (b), the MLE tree for the tree samples combined is illustrated; and in Figures (c) and (d), the MLE tree for trees from stool and saliva samples are shown, respectively.

References

    1. Group TNHW, Peterson J, Garges S, Giovanni M, McInnes P, et al (2009) The NIH Human Microbiome Project. Genome Research 19: 2317–2323. - PMC - PubMed
    1. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, et al. (2007) The human microbiome project. Nature 449: 804–810. - PMC - PubMed
    1. Schloss PD (2008) Evaluating different approaches that test whether microbial communities have the same structure. ISME J 2: 265–275. - PubMed
    1. Sul WJ, Cole JR, Jesus EC, Wang Q, Farris RJ, et al.. (2011) Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering. Proceedings of the National Academy of Sciences. - PMC - PubMed
    1. Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, et al. (2005) The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research 33: D294–D296. - PMC - PubMed

Publication types

Substances