Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun;77(11):3846-52.
doi: 10.1128/AEM.02772-10. Epub 2011 Apr 1.

Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads

Affiliations

Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads

Andrea K Bartram et al. Appl Environ Microbiol. 2011 Jun.

Erratum in

  • Appl Environ Microbiol. 2011 Aug;77(15):5569

Abstract

Microbial communities host unparalleled taxonomic diversity. Adequate characterization of environmental and host-associated samples remains a challenge for microbiologists, despite the advent of 16S rRNA gene sequencing. In order to increase the depth of sampling for diverse bacterial communities, we developed a method for sequencing and assembling millions of paired-end reads from the 16S rRNA gene (spanning the V3 region; ∼200 nucleotides) by using an Illumina genome analyzer. To confirm reproducibility and to identify a suitable computational pipeline for data analysis, sequence libraries were prepared in duplicate for both a defined mixture of DNAs from known cultured bacterial isolates (>1 million postassembly sequences) and an Arctic tundra soil sample (>6 million postassembly sequences). The Illumina 16S rRNA gene libraries represent a substantial increase in number of sequences over all extant next-generation sequencing approaches (e.g., 454 pyrosequencing), while the assembly of paired-end 125-base reads offers a methodological advantage by incorporating an initial quality control step for each 16S rRNA gene sequence. This method incorporates indexed primers to enable the characterization of multiple microbial communities in a single flow cell lane, may be modified readily to target other variable regions or genes, and demonstrates unprecedented and economical access to DNAs from organisms that exist at low relative abundances.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of the Illumina 16S rRNA gene sequencing method and generated library data. (A) The schema indicates a PCR (20 cycles) and gel purification of ∼330-base PCR products, including the conserved 16S rRNA gene primer-binding region. (B) Informatics pipeline for generating clusters and taxonomic affiliations. (C) Resulting taxonomic affiliations for the replicate control libraries (C1 and C2) and the Sanger sequencing-based library (CL). (D) Taxonomic affiliations for the Alert tundra duplicate libraries (AT1 and AT2) and the Sanger sequencing-based library (ATS).
Fig. 2.
Fig. 2.
Quality (Q) scores for all 125-base sequence reads. The Q score is an integer mapping of P, the probability that the corresponding base call is incorrect, with higher Q scores indicating lower error rates. The magnitude of sequence overlap for each assembled read was characterized, and the mean () and standard deviation (±σ) were plotted relative to sequence length. The region of potential read overlap as presented does not explicitly calculate the additive Q score at each position, as the range of overlap varied due to the large range of V3 lengths.
Fig. 3.
Fig. 3.
Rank-abundance curves for duplicate control libraries (A) and Alert Arctic tundra libraries (B). The data shown are the raw data and also the data clustered using CD-HIT at a cutoff of 97%. Note that the Alert Illumina library was considered as separate replicates (AT1 and AT2) and also as a composite library (ATCL), which represents the combined replicates.
Fig. 4.
Fig. 4.
Effect of library size on phylotype coverage. Randomly subsampled libraries were drawn in triplicate from combined AT libraries and used to calculate Good's coverage estimates. Averages for triplicates were plotted with standard deviations.
Fig. 5.
Fig. 5.
Taxonomic affiliations at the levels of phylum, class, and order for consecutive abundance ranks of sequence data clustered at 97% with CD-HIT. Predominant taxa are represented in the bottom row, and singletons are at the top for each taxonomic level. Full details of RDP affiliations are summarized in Tables S3, S4, and S5 in the supplemental material.

References

    1. Andersson A. F., et al. 2008. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS One 3:1–8 - PMC - PubMed
    1. Caporaso J. G., et al. 2010. QIIME allows analysis of high-throughput community sequence data. Nat. Methods 7:335–336 - PMC - PubMed
    1. Caporaso J. G., et al. 15 March 2011. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. U. S. A. 108:4516–4522 - PMC - PubMed
    1. Claesson M. J., et al. 2010. Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res. 38:e200. - PMC - PubMed
    1. Cole J. R., et al. 2008. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37:D141–D145 - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources