. 2021 Feb 24;6(1):e01202-20.

doi: 10.1128/mSphere.01202-20.

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing

Isabel Abellan-Schneyder¹, Monica S Matchado², Sandra Reitmeier¹, Alina Sommer¹, Zeno Sewald¹, Jan Baumbach^{2

3

4}, Markus List², Klaus Neuhaus⁵

Affiliations

¹ Core Facility Microbiome, ZIEL-Institute for Food & Health, Technische Universität München, Freising, Germany.
² Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technische Universität München, Freising, Germany.
³ Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
⁴ Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.
⁵ Core Facility Microbiome, ZIEL-Institute for Food & Health, Technische Universität München, Freising, Germany neuhaus@tum.de.

PMID: 33627512
PMCID: PMC8544895
DOI: 10.1128/mSphere.01202-20

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing

Isabel Abellan-Schneyder et al. mSphere. 2021.

. 2021 Feb 24;6(1):e01202-20.

doi: 10.1128/mSphere.01202-20.

Authors

Isabel Abellan-Schneyder¹, Monica S Matchado², Sandra Reitmeier¹, Alina Sommer¹, Zeno Sewald¹, Jan Baumbach^{2

3

4}, Markus List², Klaus Neuhaus⁵

Affiliations

¹ Core Facility Microbiome, ZIEL-Institute for Food & Health, Technische Universität München, Freising, Germany.
² Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technische Universität München, Freising, Germany.
³ Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
⁴ Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.
⁵ Core Facility Microbiome, ZIEL-Institute for Food & Health, Technische Universität München, Freising, Germany neuhaus@tum.de.

PMID: 33627512
PMCID: PMC8544895
DOI: 10.1128/mSphere.01202-20

Abstract

Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used protocols. Short amplicons targeting different variable regions (V-regions) or ranges thereof (V1-V2, V1-V3, V3-V4, V4, V4-V5, V6-V8, and V7-V9) were investigated for differences in the composition outcome due to primer choices. Next, the influence of clustering (operational taxonomic units [OTUs], zero-radius OTUs [zOTUs], and amplicon sequence variants [ASVs]), different databases (GreenGenes, the Ribosomal Database Project, Silva, the genomic-based 16S rRNA Database, and The All-Species Living Tree), and bioinformatic settings on taxonomic assignment were also investigated. We present a systematic comparison across all typically used V-regions using well-established primers. While it is known that the primer choice has a significant influence on the resulting microbial composition, we show that microbial profiles generated using different primer pairs need independent validation of performance. Further, comparing data sets across V-regions using different databases might be misleading due to differences in nomenclature (e.g., Enterorhabdus versus Adlercreutzia) and varying precisions in classification down to genus level. Overall, specific but important taxa are not picked up by certain primer pairs (e.g., Bacteroidetes is missed using primers 515F-944R) or due to the database used (e.g., Acetatifactor in GreenGenes and the genomic-based 16S rRNA Database). We found that appropriate truncation of amplicons is essential and different truncated-length combinations should be tested for each study. Finally, specific mock communities of sufficient and adequate complexity are highly recommended.IMPORTANCE In 16S rRNA gene sequencing, certain bacterial genera were found to be underrepresented or even missing in taxonomic profiles when using unsuitable primer combinations, outdated reference databases, or inadequate pipeline settings. Concerning the last, quality thresholds as well as bioinformatic settings (i.e., clustering approach, analysis pipeline, and specific adjustments such as truncation) are responsible for a number of observed differences between studies. Conclusions drawn by comparing one data set to another (e.g., between publications) appear to be problematic and require independent cross-validation using matching V-regions and uniform data processing. Therefore, we highlight the importance of a thought-out study design including sufficiently complex mock standards and appropriate V-region choice for the sample of interest. The use of processing pipelines and parameters must be tested beforehand.

Keywords: 16S rRNA gene sequencing; amplicon sequencing; bioinformatic settings; clustering; databases; microbiome; mock communities; variable regions.

PubMed Disclaimer

Figures

**FIG 1**
Overview of the analysis strategies used in this study. DNAs from different sample types with increasing complexity (i.e., 3 mock communities and 33 human stool samples) were extracted. Amplicons were generated using different primer pairs targeting different V-regions and sequenced on an Illumina MiSeq. Afterwards, the impacts of different clustering approaches and reference databases on the microbial profiles were investigated.

**FIG 2**
NMDS plots for the microbiome composition of human samples. Sample similarity is shown at phylum level (A and B) and at genus level (C and D). Different primer pairs are indicated to the right for all panels. Top panels (A and C) include processing the V4-V5 region, while for the bottom panels (B and D) this region has been omitted since results using 515F-944R primers (blue squares in panels A and C) fall separately from all other clusters. Labeling of the samples in the bottom panels (B and D) is based on donor number.

**FIG 3**
Presence-and-absence map of human samples on phylum level for different V-regions. Gray represents present taxa, and white represents absent taxa. Primers and their V-region spanning are given in Table 1.

**FIG 4**
Comparison of the influence of the clustering method on taxonomic designation for the ZIEL-I mock community (A) and an example of a representative human sample T1 (B). The genus-level composition is shown according to ASVs, zOTUs, and OTUs as indicated. “Other” represents taxa not matching the composition of the mock community, while “unassigned” represents reads that could not be assigned to any taxonomic classification (RDP was used as a reference database). Primers and their V-region spanning are given in Table 1.

**FIG 5**
Comparison of mock communities sequenced over different V-regions, processed using different databases as references (GG, GreenGenes; RDP, Ribosomal Database Project; GRD, the genomic-based 16S rRNA database; LTP, The All-Species Living Tree Project) at genus level. Primers and their V-region spanning are given in Table 1.

**FIG 6**
(A and B) The effects of different lengths of forward and reverse reads after truncation on the percentage of sequences retained after denoising (A) and number of features obtained (B) for the ZIEL-I mock community. The numbers of mismatches obtained after local BLAST search against reference sets are shown; these were used in order to test the accuracy of the ASV predictions (C). (D and E) Analysis of human data set on retained reads after denoising and truncation (D) and number of features obtained (E) for each read-length combination.

**FIG 7**
Recommended validation strategy before starting new microbiome studies, especially for uncommon environments. Even existing commonly used parameter combinations might be reevaluated. Thus, complex mock communities should be used and sequenced, testing a variety of different primer pairs for best performance within the environment of interest. Despite their being of minor influence, we still recommend using clustering approaches that include denoising steps (e.g., DADA2 generating ASVs) and recommend the seemingly well-curated and up-to-date databases RDP and Silva as references.

See this image and copyright information in PMC

References

1. Reitmeier S, Kiessling S, Clavel T, List M, Almeida EL, Ghosh TS, Neuhaus K, Grallert H, Linseisen J, Skurk T, Brandl B, Breuninger TA, Troll M, Rathmann W, Linkohr B, Hauner H, Laudes M, Franke A, Le Roy CI, Bell JT, Spector T, Baumbach J, O’Toole PW, Peters A, Haller D. 2020. Arrhythmic gut microbiome signatures predict risk of type 2 diabetes. Cell Host Microbe 28:258–272.e6. doi:10.1016/j.chom.2020.06.004. - DOI - PubMed
1. Hamady M, Knight R. 2009. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 19:1141–1152. doi:10.1101/gr.085464.108. - DOI - PMC - PubMed
1. Shokralla S, Spall JL, Gibson JF, Hajibabaei M. 2012. Next-generation sequencing technologies for environmental DNA research. Mol Ecol 21:1794–1805. doi:10.1111/j.1365-294X.2012.05538.x. - DOI - PubMed
1. Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. 2017. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun 8:1784. doi:10.1038/s41467-017-01973-8. - DOI - PMC - PubMed
1. Goodrich JK, Davenport ER, Beaumont M, Jackson MA, Knight R, Ober C, Spector TD, Bell JT, Clark AG, Ley RE. 2016. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 19:731–743. doi:10.1016/j.chom.2016.04.017. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing

Affiliations

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources