Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Aug 30;108(35):14637-42.
doi: 10.1073/pnas.1111435108. Epub 2011 Aug 22.

Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

Affiliations
Comparative Study

Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

Woo Jun Sul et al. Proc Natl Acad Sci U S A. .

Abstract

High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Qualified sequence classification percentages at different confidence thresholds determined by the RDP classifier for the indicated taxonomic levels.
Fig. 2.
Fig. 2.
Scatter plot of β-diversity distance orders (the highest β-diversity is considered as rank 1) calculated using community-by-taxonomy bins matrix (x axis) RDP's classifier at 0% threshold and community-by-OTU matrix (y axis). In this plot, β-diversity distance is 1 − Chao's adjusted Sørensen similarity index.
Fig. 3.
Fig. 3.
Comparison of NMDS plots based on abundance-based distance (1 − Chao's adjusted Sørensen similarity index). The habitat groups defined by using the ontology of Habitat-Lite are indicated by the different color and shape points (key below). Community-by-taxonomy bins at (A) 80%, (B) 50%, (C) 0%, and (D) community-by-OTU.
Fig. 4.
Fig. 4.
Comparison of NMDS plots based on occurrence-based distance (1 − Jaccard similarity index). The habitat groups defined by using the ontology of Habitat-Lite are indicated by the different color and shape points (key below). Community-by-taxonomy bins at (A) 80%, (B) 50%, (C) 0%, and (D) community-by-OTU.
Fig. 5.
Fig. 5.
Simulation of “sequencing errors.” The similarity indices measured the differences between the original parent library and the altered library by simulated sequencing errors distance, mean of the nucleotide substitution rates (%) in all query sequences done in a randomized manner.

References

    1. Tringe SG, Hugenholtz P. A renaissance for the pioneering 16S rRNA gene. Curr Opin Microbiol. 2008;11:442–446. - PubMed
    1. Sogin ML, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere.”. Proc Natl Acad Sci USA. 2006;103:12115–12120. - PMC - PubMed
    1. Huber JA, et al. Microbial population structures in the deep marine biosphere. Science. 2007;318:97–100. - PubMed
    1. Roesch LF, et al. Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J. 2007;1:283–290. - PMC - PubMed
    1. Hamady M, Knight R. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res. 2009;19:1141–1152. - PMC - PubMed

Publication types

Substances

LinkOut - more resources