Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(8):e44357.
doi: 10.1371/journal.pone.0044357. Epub 2012 Aug 30.

Implications of pyrosequencing error correction for biological data interpretation

Affiliations

Implications of pyrosequencing error correction for biological data interpretation

Matthew G Bakker et al. PLoS One. 2012.

Abstract

There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Corresponding OTUs between data processing methods. A)
Each method generated some OTUs which consist entirely of sequence reads that were culled in the alternate method. These inconsistent OTUs were mostly singletons and were more abundant in the standard pipeline. B) Most OTUs had a clear corresponding OTU in the alternate method. Data shown are OTUs having a membership>50 reads in the AmpliconNoise dataset, and the proportion of the membership of each OTU that was shared with the best corresponding OTU in the standard pipeline dataset.
Figure 2
Figure 2. Impacts of de-noising on the rank-abundance distribution of OTUs.
AmpliconNoise processing significantly altered the OTU rank-abundance distribution (two-sample Kolmogorov-Smirnov test; D = 0.20, p<0.0001), and increased evenness.
Figure 3
Figure 3. Impacts of de-noising on OTU richness and diversity. A)
Relationship between OTU richness with and without de-noising, by sample. B) Relationship between OTU diversity (Shannon index) with and without de-noising, by sample. C) Ranking of samples by Shannon diversity index with and without de-noising.
Figure 4
Figure 4. Impacts of de-noising on phylogenetic diversity. A)
Relationship between phylogenetic diversity with and without de-noising, by sample. B) Ranking of samples by phylogenetic diversity with and without de-noising.

References

    1. Quince C, Lanzén A, Curtis TP, Davenport RJ, Hall N, et al. (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Meth 6: 639–641 doi:10.1038/nmeth.1361. - DOI - PubMed
    1. Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8: R143. - PMC - PubMed
    1. Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010) Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12: 118–123 doi:10.1111/j.1462-2920.2009.02051.x. - DOI - PubMed
    1. Sun Y, Cai Y, Liu L, Yu F, Farrell ML, et al. (2009) ESPRIT: Estimating species richness using large collections of 16S rRNA pyrosequences. Nuc Acid Res 37: e76–e76 doi:10.1093/nar/gkp285. - DOI - PMC - PubMed
    1. Zaura E, Keijser BJ, Huse SM, Crielaard W (2009) Defining the healthy “core microbiome” of oral microbial communities. BMC Microbiol 9: 259 doi:10.1186/1471-2180-9-259. - DOI - PMC - PubMed

Publication types