Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 28:4:e1869.
doi: 10.7717/peerj.1869. eCollection 2016.

Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

Affiliations

Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

Patrick D Schloss et al. PeerJ. .

Abstract

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

Keywords: 16S rRNA gene sequencing; Bioinformatics; Microbial ecology; Microbiome; Next generation sequencing; PacBio; Sequencing error.

PubMed Disclaimer

Conflict of interest statement

Sarah K. Highlander is an employee of JCVI.

Figures

Figure 1
Figure 1. Summary of errors in data generated using PacBio sequencing platform to sequence various regions within the 16S rRNA gene.
The predicted error rate using PacBio’s sequence analysis algorithm correlated well with the observed error rate (Pearson’s R: − 0.67; (A). Because of the large number of sequences, we randomly selected 5% of the data to show in (A). The sequencing error rate of the amplified gene fragments increased with mismatches to the barcodes and primers (B). The sequencing error rate declined with increased sequencing coverage; however, increasing the sequencing depth beyond 10-fold coverage had no meaningful effect on the sequencing error rate (C). The scale of they y-axis in B and C are the same.
Figure 2
Figure 2. Change in error rate (A) and the percentage of sequences that were retained (B) when using various sequence curation methods.
The condition that was used for downstream analyses is indicated by the star. The plotted numbers represent the region that was sequenced. For example “19” represents the data for the V1–V9 region.
Figure 3
Figure 3. Percentage of unique sequences that could be classified.
Classifications were performed using taxonomy references curated from the RDP, SILVA, or greengenes databases for the four types of samples that were sequenced across the six regions from the 16S rRNA gene. Only the greengenes taxonomy reference provided species-level information.
Figure 4
Figure 4. Percentage of 1-nt variants that occurred up to ten times.
Sequences that were 1 nt different from the mock community reference sequences were counted to determine the number of times each variant appeared by region within the 16S rRNA gene.

References

    1. Au KF, Underwood JG, Lee L, Wong WH. Improving PacBio long read accuracy by short read alignment. PLoS ONE. 2012;7:e1869. doi: 10.1371/journal.pone.0046679. - DOI - PMC - PubMed
    1. Benítez-Páez A, Portune KJ, Sanz Y. Species-level resolution of 16S rRNA gene amplicons sequenced through the MinION portable nanopore sequencer. GigaScience. 2016;5 doi: 10.1186/s13742-016-0111-z. - DOI - PMC - PubMed
    1. Burke C, Darling AE. Resolving microbial microdiversity with high accuracy full length 16S rRNA illumina sequencing. 2014. Preprint. Available at http://dx.doi.org/10.1101/010967 . - DOI
    1. Carneiro MO, Russ C, Ross MG, Gabriel SB, Nusbaum C, DePristo MA. Pacific biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics. 2012;13:375. doi: 10.1186/1471-2164-13-375. - DOI - PMC - PubMed
    1. Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. Ribosomal database project: data and tools for high throughput rRNA analysis. Nucleic Acids Research. 2013;42:D633–D642. doi: 10.1093/nar/gkt1244. - DOI - PMC - PubMed

LinkOut - more resources