Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 20:3:20.
doi: 10.1186/s40168-015-0081-x. eCollection 2015.

Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity

Affiliations

Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity

Yan He et al. Microbiome. .

Erratum in

Abstract

Background: The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses.

Results: Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded.

Conclusions: As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Rarefaction curves, principles underlying unstable complete linkage (CL) clustering, and PCoA based on the Bray-Curtis distance. (a) Rarefaction curves generated with CL clustering at five different depths. Point A is the number of OTUs at 30,000 sequences from the 100% dataset, and point B is the number of OTUs at 30,000 sequences from the 60% dataset. (b) Principles underlying unstable CL clustering at two sampling depths. White circles indicate individual sequences that were included in both the small and the large subsamples, and dark circles indicate sequences that were added only in the large subsample. Lines indicate pairs of sequences with distances equal to or smaller than the threshold, which could therefore be linked into a single OTU. Large circles in red or blue indicate OTUs in the small and the large subsamples, respectively. (c) PCoA based on the Bray-Curtis distance, comparing 60% subsamples with the full datasets using CL. All of the subsamples were rarefied to 30,000 sequences per sample to be included in this analysis.
Figure 2
Figure 2
Principles underlying unstable single linkage (SL) clustering, rarefaction curves, and PCoA based on the Bray-Curtis distance. (a) Principles underlying unstable SL clustering at two sampling depths. White circles indicate individual sequences that were included in both the small and the large subsamples, and dark circles indicate sequences that are added only in the large subsample. Lines indicate pairs of sequences with distances equal to or smaller than the threshold, which could therefore be linked into a single OTU. Large circles in red or blue indicate OTUs in the small and the large subsamples, respectively. (b, d) Rarefaction curves generated with SL (b) and average linkage (AL) (d) clustering at five different depths. (c, e) PCoA based on the Bray-Curtis distance, comparing 60% subsamples with the full datasets using SL (c) and AL (e). All of the subsamples were rarefied to 30,000 sequences per sample to be included in this analysis.
Figure 3
Figure 3
Principles underlying unstable distance-based greedy clustering (DGC) and abundance-based greedy clustering (AGC), rarefaction curves, and PCoA based on the Bray-Curtis distance. (a, b) Principles underlying unstable DGC (a) and AGC (b) at two sampling depths. White circles indicate individual sequences that were included in both the small and the large subsamples, and dark circles indicate sequences that were added only in the large subsample. Yellow dots indicate OTU centroids. Lines indicate pairs of sequences with distances equal to or smaller than the threshold, which could therefore be linked into a single OTU. Large circles in red or blue indicate OTUs in the small and the large subsamples, respectively. (c, d) Rarefaction curves generated with DGC (c) and AGC (d) at five different depths. (e, f) PCoA based on the Bray-Curtis distance, comparing 60% subsamples with the full datasets using AGC (e) and DGC (f). All of the subsamples were rarefied to 30,000 sequences per sample to be included in this analysis.
Figure 4
Figure 4
Proportion of unstable sequences, proportion of unstable OTUs, and MCC value of each method. (a) Proportion of unstable sequences as created by method. Unstable sequences are defined as sequences that are clustered to one centroid in the 60% subsample but clustered to a different centroid in the 100% (full) dataset. (b) Proportion of unstable OTUs as created by method and by frequency of cluster centroids (the values for closed-reference and dereplication are zero and are thus not included in this figure). If an OTU was identical in the 60% and 100% datasets (not including sequences that are not present in the 60% subsample), it is defined as stable. (c) MCC value of each method. Higher values correspond to greater stability.
Figure 5
Figure 5
Principles underlying stable closed-reference clustering, rarefaction curves, and PCoA based on the Bray-Curtis distance. (a) Principles underlying stable closed-reference clustering at two sampling depths. White circles indicate individual sequences that were included in both the small and the large subsamples, and dark circles indicate sequences that were added only in the large subsample. Diamonds indicate reference sequences. Lines indicate pairs of sequences with distances equal to or smaller than the threshold, which could therefore be linked into a single OTU. Large circles in red or blue indicate OTUs in the small and the large subsamples, respectively. (b) Rarefaction curves generated with closed-reference clustering at five different depths. (c) PCoA based on the Bray-Curtis distance, comparing 60% subsamples with the full datasets using closed reference clustering. All of the subsamples were rarefied to 30,000 sequences per sample to be included in this analysis.

References

    1. The Human Microbiome Project Consortium A framework for human microbiome research. Nature. 2012;486:215–21. doi: 10.1038/nature11209. - DOI - PMC - PubMed
    1. Jiang X-T, Zhang H, Sheng H-F, Wang Y, He Y, Zou F, Zhou H-W. Two-stage clustering (TSC): a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons. PLoS ONE. 2012;7:e30230. doi: 10.1371/journal.pone.0030230. - DOI - PMC - PubMed
    1. Schloss PD, Westcott SL. Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol. 2011;77:3219–26. doi: 10.1128/AEM.02810-10. - DOI - PMC - PubMed
    1. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–1. doi: 10.1093/bioinformatics/btq461. - DOI - PubMed
    1. Edgar RC. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods. 2013;10:996–8. doi: 10.1038/nmeth.2604. - DOI - PubMed