Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Editorial
. 2016 Apr 20:2:16004.
doi: 10.1038/npjbiofilms.2016.4. eCollection 2016.

A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity

Affiliations
Editorial

A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity

Nam-Phuong Nguyen et al. NPJ Biofilms Microbiomes. .

Abstract

The standard pipeline for 16S amplicon analysis starts by clustering sequences within a percent sequence similarity threshold (typically 97%) into 'Operational Taxonomic Units' (OTUs). From each OTU, a single sequence is selected as a representative. This representative sequence is annotated, and that annotation is applied to all remaining sequences within that OTU. This perspective paper will discuss the known shortcomings of this standard approach using results obtained from the Human Microbiome Project. In particular, we will show that the traditional approach of using pairwise sequence alignments to compute sequence similarity can result in poorly clustered OTUs. As OTUs are typically annotated based upon a single representative sequence, poorly clustered OTUs can have significant impact on downstream analyses. These results suggest that we need to move beyond simple clustering techniques for 16S analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Box plots of the mean MSA and BLD across the different phyla for the V1V3 OTUs. We report the distribution of the mean MSA-based sequence dissimilarity and mean phylogenetic branch length distance between the representative sequence and all other sequences within an OTU for the V1V3 OTUs. We group the distributions according to the OTUs’ phylum-level annotations. The red line delineates a distance of 0.03. We report the total number of OTUs that belong to that phylum in parenthesis.
Figure 2
Figure 2
Box plots of the mean MSA and BLD across the different phyla for the V3V5 OTUs. We report the distribution of the mean MSA-based sequence dissimilarity and mean phylogenetic branch length distance between the representative sequence and all other sequences within an OTU for the V3V5 OTUs. We group the distributions according to the OTUs’ phylum-level annotations. The red line delineates a distance of 0.03. We report the total number of OTUs that belong to that phylum in parenthesis.
Figure 3
Figure 3
Box plots of the mean MSA and BLD across the different genera within the Firmicutes, Bacteroidete, and Fusobacteria phyla for the V1V3 OTU. We report the distribution of the mean MSA-based sequence dissimilarity and mean phylogenetic branch length distance between the representative sequence and all other sequences within an OTU for the V1V3 OTUs. We group the distributions according to the OTUs’ genus-level annotations. The red line delineates a distance of 0.03. We report the total number of OTUs that belong to that genus in parenthesis.
Figure 4
Figure 4
ML trees for the V1V3 OTUs classified as the genus Sneathia. We show the ML trees estimated on the V1V3 OTU alignments that belong to the genus Sneathia. The OTUs are (a) OTU 1429, (b) OTU 33700, (c) OTU 442, and (d) OTU 311. The representative sequence for each OTU is highlighted in yellow. All trees are drawn on the same scale.
Figure 5
Figure 5
Distribution of the MSA and BLD for the V1V3 OTUs 33700, 10405, 7767. We show the (a) pairwise phylogenetic branch length distributions and (b) MSA-based sequence dissimilarity distributions for three V1V3 OTUs. OTU 33700 belongs to the genus Sneathia, OTU 7767 belongs to the genus Lactobacillus, and OTU 10405 belongs to the phylum TM7.

References

    1. Konstantinidis, K. T. & Tiedje, J. M. Genomic insights that advance the species definition for prokaryotes. Proc. Natl Acad. Sci. USA 102, 2567–2572 (2005). - PMC - PubMed
    1. Wang, Q. , Garrity, G. M. , Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007). - PMC - PubMed
    1. Chaudhary, N. , Sharma, A. K. , Agarwal, P. , Gupta, A. & Sharma, V. K. 16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets. PLoS ONE 10, e0116106 (2015). - PMC - PubMed
    1. Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010). - PMC - PubMed
    1. Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009). - PMC - PubMed

Publication types