. 2017 Jul 6;5(1):68.

doi: 10.1186/s40168-017-0279-1.

A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform

Eric J de Muinck¹, Pål Trosvik¹, Gregor D Gilfillan², Johannes R Hov³, Arvind Y M Sundaram⁴

Affiliations

¹ Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
² Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
³ Norwegian PSC Research Center and Research Institute of Internal Medicine, Oslo University Hospital Rikshospitalet and University of Oslo, Oslo, Norway.
⁴ Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway. arvind.sundaram@medisin.uio.no.

PMID: 28683838
PMCID: PMC5501495
DOI: 10.1186/s40168-017-0279-1

A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform

Eric J de Muinck et al. Microbiome. 2017.

. 2017 Jul 6;5(1):68.

doi: 10.1186/s40168-017-0279-1.

Authors

Eric J de Muinck¹, Pål Trosvik¹, Gregor D Gilfillan², Johannes R Hov³, Arvind Y M Sundaram⁴

Affiliations

¹ Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
² Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
³ Norwegian PSC Research Center and Research Institute of Internal Medicine, Oslo University Hospital Rikshospitalet and University of Oslo, Oslo, Norway.
⁴ Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway. arvind.sundaram@medisin.uio.no.

PMID: 28683838
PMCID: PMC5501495
DOI: 10.1186/s40168-017-0279-1

Abstract

Background: Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized.

Methods: We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms.

Results: The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost.

Conclusions: Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.

Keywords: 16S rRNA gene amplicon sequencing; Benchmarking; Chimera formation; Environmental sequencing; Illumina library preparation; Indexed PCR; Mock community; PCR bias.

PubMed Disclaimer

Figures

**Fig. 1**
Triple indexing design. The triple indexing strategy incorporates two PCR steps. During the first PCR (*PCR1*), the template sequence of interest is targeted and amplified (*green*). The primers for this reaction also contain an indexing sequence and a heterogeneity spacer sequence (*red*), and a partial Illumina adapter (*blue*). A second PCR (PCR2) allows for the introduction of a third indexing sequence (*dark blue*) as well as completion of the Illumina sequencing adapter

**Fig. 2**
Relative abundances of the 33 bacterial species in the mock community sample estimated from both the MiSeq (dataset 1, n = 96) and HiSeq (dataset 4, n = 24) data (Additional file 4: Table S3, Additional file 7: Table S6). Species abundance estimates are shown side-by-side with MiSeq estimates labeled ‘MS’ and HiSeq estimates labeled ‘HS’. For enhanced visualization, each pair of *colored bars* (*blue* or *white*) depicts the estimated relative abundances for one species. The *dotted red line* shows the relative abundance expectation given perfectly equal blending. Each *box* represents the interquartile range while the *whiskers* represent 1.5 times the interquartile range. *Points* outside the whiskers represent outliers

**Fig. 3**
Scores plot based on a principal component analysis model computed from the matrix of species relative abundances from dataset 1 (Additional file 4: Table S3). a. Samples are colored according to the reverse primer used for PCR1. b. Samples are colored according to the forward primer used for PCR1. In both a and b, the first two dimensions, explaining 62% of the total variance, are shown

**Fig. 4**
Relationship between mean relative abundance estimates and GC percentage for datasets 1 and 4. There is a significant negative linear relationship for both the MiSeq (p = 0.002, n = 96, Additional file 4: Table S3) and HiSeq (p = 0.012, n = 24, Additional file 7: Table S6) data. Estimates drop by 0.18 and 0.16% for each 1% increase in GC content for the MiSeq and HiSeq estimates, respectively

**Fig. 5**
Scores plot based on a principal component analysis model computed from the matrix of species relative abundances from dataset 3 (Additional file 6: Table S5). Samples are colored according to PCR1 and PCR2 cycle regime, with the number of cycles indicated in the legend (PCR1 + PCR2). *Filled dots* and *triangles* represent samples prepared with tenfold difference in input DNA template concentration used for PCR1. The first two dimensions, explaining 65% of the total variance, are shown

**Fig. 6**
Statistical significance and direction of relationships between estimated relative abundances of sequence reads and PCR cycle number (Additional file 18: Figure S9) in dataset 3. The *dots* represent p values from linear regression models, with *green* and *red* representing positive and negative relationships, respectively. The species are ordered according to the GC content on the sequenced fragment (*vertical lines*). The *dotted blue lines* signifies the significance threshold of p = 0.05 (*left axis*), while the *dotted black line* represents the mean GC percentage (*right axis*)

**Fig. 7**
Relationship between PCR cycle number and chimeric sequence formation in dataset 3. The combined numbers of PCR1 and PCR2 amplification cycles are indicated on the x-axis. *Black* and *red dots* indicate samples amplified using 5 and 10 cycles for PCR2, respectively. A highly significant linear relationship (p < <0.001, linear regression model) was observed. The effects were primarily related to the PCR1 cycle number, e.g., samples undergoing 35 cycles (25 cycle PCR1 and 10 cycles PCR 2) had less chimeras than samples undergoing 35 cycles (30 cycles PCR1 and 5 cycles PCR2)

**Fig. 8**
a. Pairwise Bray-Curtis distances for the mock community (MC, dataset 1, Additional file 4: Table S3), standardized sample (SS, dataset 5, Additional file 8: Table S7), and healthy adult (HA, dataset 6, Additional file 9: Table S8) group. Each *box* represents the interquartile range while the *whiskers* represent 1.5 times the interquartile range. *Points* outside the whiskers represent outliers. The number of pairwise distances for each group is indicated over the boxes. b. Multidimensional scaling (MDS) plot showing clustering of 25 samples taken from 5 healthy adult volunteers (dataset 6, Additional file 9: Table S8). Sample origin is indicated by *color* (individual 1–5). The stress value of the MDS model was 13.2%, indicating a good fit. c. Pairwise Bray-Curtis distances for the 15 samples sequenced using 2 different library preparation methods (dataset 7, Additional file 10, Table S9). The leftmost box shows distances between identical samples (P = paired), while the box on the right shows the distances for non-identical samples (UP = unpaired). d Multidimensional scaling plot showing clustering of the 15 samples sequenced using 2 different library preparation methods (dataset 7, Additional file 10, Table S9). Paired samples, i.e., identical samples sequences using different techniques, are joined by *black lines*

See this image and copyright information in PMC

Cited by

Engineering CRISPR/Cas9 to mitigate abundant host contamination for 16S rRNA gene-based amplicon sequencing.
Song L, Xie K. Song L, et al. Microbiome. 2020 Jun 3;8(1):80. doi: 10.1186/s40168-020-00859-0. Microbiome. 2020. PMID: 32493511 Free PMC article.
Coumarin biosynthesis genes are required after foliar pathogen infection for the creation of a microbial soil-borne legacy that primes plants for SA-dependent defenses.
Vismans G, van Bentum S, Spooren J, Song Y, Goossens P, Valls J, Snoek BL, Thiombiano B, Schilder M, Dong L, Bouwmeester HJ, Pétriacq P, Pieterse CMJ, Bakker PAHM, Berendsen RL. Vismans G, et al. Sci Rep. 2022 Dec 28;12(1):22473. doi: 10.1038/s41598-022-26551-x. Sci Rep. 2022. PMID: 36577764 Free PMC article.
A collection of rumen bacteriome data from 334 mid-lactation dairy cows.
Sun HZ, Xue M, Guan LL, Liu J. Sun HZ, et al. Sci Data. 2019 Jan 22;6:180301. doi: 10.1038/sdata.2018.301. Sci Data. 2019. PMID: 30667380 Free PMC article.
Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline.
Straub D, Blackwell N, Langarica-Fuentes A, Peltzer A, Nahnsen S, Kleindienst S. Straub D, et al. Front Microbiol. 2020 Oct 23;11:550420. doi: 10.3389/fmicb.2020.550420. eCollection 2020. Front Microbiol. 2020. PMID: 33193131 Free PMC article.
Effect of probiotics on diversity and function of gut microbiota in Moschus berezovskii.
Yang C, Huang W, Sun Y, You L, Jin H, Sun Z. Yang C, et al. Arch Microbiol. 2021 Aug;203(6):3305-3315. doi: 10.1007/s00203-021-02315-5. Epub 2021 Apr 16. Arch Microbiol. 2021. PMID: 33860850

See all "Cited by" articles

References

1. Soergel DA, Dey N, Knight R, Brenner SE. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 2012;6:1440–4. doi: 10.1038/ismej.2011.208. - DOI - PMC - PubMed
1. D'Amore R, Ijaz UZ, Schirmer M, Kenny JG, Gregory R, Darby AC, Shakya M, Podar M, Quince C, Hall N. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genomics. 2016;17:55. doi: 10.1186/s12864-015-2194-9. - DOI - PMC - PubMed
1. Low-diversity sequencing on the Illumina HiSeq platform (Illumina Technical Note 770-2014-035). Illumina. 2014. http://www.illumina.com/documents/products/technotes/technote-hiseq-low-... Accessed June 2016.
1. Fadrosh DW, Ma B, Gajer P, Sengamalay N, Ott S, Brotman RM, Ravel J. An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform. Microbiome. 2014;2:6. doi: 10.1186/2049-2618-2-6. - DOI - PMC - PubMed
1. Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl Environ Microbiol. 2013;79:5112–20. doi: 10.1128/AEM.01043-13. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform

Affiliations

A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous