Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug 13;8(8):e70837.
doi: 10.1371/journal.pone.0070837. eCollection 2013.

A comparison of methods for clustering 16S rRNA sequences into OTUs

Affiliations

A comparison of methods for clustering 16S rRNA sequences into OTUs

Wei Chen et al. PLoS One. .

Abstract

Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. NID scores of ten algorithms based on the data set simclone15_1 and simclone_200.
Figure 2
Figure 2. A Precision versus Recall plot generated from data set simclone15_1.
Figure 3
Figure 3. The results of OTUs estimated with different frequency thresholds at different dissimilarity levels, from the data set Clone43.

References

    1. Whitman WB, Coleman DC, Wiebe WJ (1998) Prokaryotes: the unseen majority. Proc Natl Acad Sci USA 95(12): 6578–6583. - PMC - PubMed
    1. Sogin ML, Morrison HG, Huber JA, Mark WD, Huse SM, et al. (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc Natl Acad Sci USA 103(32): 12115–12120. - PMC - PubMed
    1. Ley RE, Backhed F, Turnbaugh P, Lozupone CA, Knight RD, et al. (2005) Obesity alters gut microbial ecology. Proc Natl Acad Sci USA 102(31): 11070–11075. - PMC - PubMed
    1. Ley RE, Turnbaugh PJ, Klein S, Gordon JI (2006) Microbial ecology: human gut microbes associated with obesity. Nature 444(7122): 1022–1023. - PubMed
    1. Duncan KE, Gieg LM, Parisi VA, Tanner RS, Tringe SG, et al. (2009) Biocorrosive thermophilic microbial communities in Alaskan North Slope oil facilities. Environ Sci Technol 43(20): 7977–7984. - PubMed

Publication types

Substances