Comparative Study

. 2019 Oct 1;26(5):391-398.

doi: 10.1093/dnares/dsz017.

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes

Mitsuhiko P Sato¹, Yoshitoshi Ogura¹, Keiji Nakamura¹, Ruriko Nishida^{1

2}, Yasuhiro Gotoh¹, Masahiro Hayashi^{3

4}, Junzo Hisatsune^{5

6

7}, Motoyuki Sugai^{5

6

7}, Itoh Takehiko⁸, Tetsuya Hayashi¹

Affiliations

¹ Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.
² Department of Medicine and Biosystemic Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.
³ Division of Anaerobe Research, Life Science Research Center, Gifu University, Gifu, Gifu, Japan.
⁴ Center for Conservation of Microbial Genetic Resource, Gifu University, Gifu, Gifu, Japan.
⁵ Project Research Center for Nosocomial Infectious Diseases, Hiroshima University, Hiroshima, Hiroshima, Japan.
⁶ Department of Bacteriology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Hiroshima, Japan.
⁷ Antimicrobial Resistance Research Center, National Institute of Infectious Diseases, Tokyo, Japan.
⁸ Department of Biological Information, Tokyo Institute of Technology, Tokyo, Japan.

PMID: 31364694
PMCID: PMC6796507
DOI: 10.1093/dnares/dsz017

Comparative Study

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes

Mitsuhiko P Sato et al. DNA Res. 2019.

. 2019 Oct 1;26(5):391-398.

doi: 10.1093/dnares/dsz017.

Authors

Affiliations

¹ Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.
² Department of Medicine and Biosystemic Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.
³ Division of Anaerobe Research, Life Science Research Center, Gifu University, Gifu, Gifu, Japan.
⁴ Center for Conservation of Microbial Genetic Resource, Gifu University, Gifu, Gifu, Japan.
⁵ Project Research Center for Nosocomial Infectious Diseases, Hiroshima University, Hiroshima, Hiroshima, Japan.
⁶ Department of Bacteriology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Hiroshima, Japan.
⁷ Antimicrobial Resistance Research Center, National Institute of Infectious Diseases, Tokyo, Japan.
⁸ Department of Biological Information, Tokyo Institute of Technology, Tokyo, Japan.

PMID: 31364694
PMCID: PMC6796507
DOI: 10.1093/dnares/dsz017

Abstract

In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.

Keywords: Illumina sequencing; bacterial genome sequencing; library preparation kits; metagenome sequencing; sequencing bias.

PubMed Disclaimer

Figures

**Figure 1**
Quality comparison of *E. coli* and *S. aureus* genome assemblies obtained by library preparation kits. (A) Assembly statistics obtained by six library preparation kits were compared in *E. coli* and *S. aureus*. Two *E. coli* and two *S. aureus* genomes were analysed as model bacterial genomes to compare six library preparation kits. Illumina read sequences obtained from each library were assembled using Velvet and SPAdes, and the numbers of contigs and L50 values of each assembly are shown. In each sequence data set, assembly was repeated 10 times using Illumina reads randomly selected at 30× coverage. Error bars indicate standard deviations. The six kits used cover three fragmentation strategies (see the main text). XT, Nextera XT; FL, Nextera DNA Flex; KP, KAPA HyperPlus; NN, NEBNext Ultra II; QS, QIAseq FX; and TS, TruSeq nano. (B) Relative sequence coverage in relation to GC content was calculated in *E. coli* and *S. aureus* genomes obtained by three library preparation kits. Relative sequence coverage in the genome assemblies obtained by the XT, FL, and KP kits and GC content were calculated for every 200-bp window with no overlap. Only the first 120,000 bp regions of each genome are shown. (C) Relationships between GC content and sequence coverage in the *E. coli* and *S. aureus* genome assemblies obtained by six library preparation kits are shown. The relative abundance of 200 bp bins with a given GC content (defined by 0.5% interval) and the mean relative coverage of bins with a given GC content ( $C_{GC}$ ) were calculated and are shown along with GC content by black lines or lines coloured according to the library preparation kits, respectively. Black horizontal lines ( $C_{GC}$ =1) represent unbiased coverage. The data for bins with extreme GC content (those representing <0.5% of all 200 bp bins) are not shown. Color figures are available at *DNARES* online.

**Figure 2**
Overall GC content-associated sequencing bias observed in 22 strains of non-*S. aureus* species in the genus *Staphylococcus*. Sequence reads were obtained from 22 strains of non-*S. aureus* species in the genus *Staphylococcus* using the XT and KP kits. The overall sequencing bias associated with GC content observed in the genome assemblies was quantified (see Materials and methods in the main text), and the relationships between the quantified overall sequencing bias and the mean GC content of each genome are shown. Solid lines indicate regression lines, and the 95% confidence intervals are indicated in grey.

**Figure 3**
Overall GC content-associated sequencing bias in the sequence data of 191 species obtained by the XT library preparation kit. Illumina sequencing data for 191 species (one strain from each species) produced using the XT kit from a project of NBRP of Japan were downloaded from the public database (DDBJ). The overall GC content-associated sequencing bias in each data set was quantified, and relationships between the quantified overall sequencing bias and the mean GC content of each genome are shown.

**Figure 4**
Metagenome sequencing of a mock bacterial community using six library preparation kits and the sequencing bias introduced by each kit. (A) Libraries of a mock bacterial community prepared by six library preparation kits were sequenced, and the relative genome abundance estimated in each data set obtained by six library preparation kits is shown. The mock community was composed of nine species with various levels of GC content. The relative abundances of each species were normalized by their genome sizes and the copy numbers of each species in the sample, which were determined by ddPCR. (B) Relationships between the GC content and sequence coverage in each genome in the mock community are shown. The mean relative coverage of each 200-bp bin with a given GC content ( $C_{GC}$ ) in each genome was calculated in each data set and is shown according to GC content by coloured lines. The colours of the lines correspond to the species shown in panel (A). Black horizontal lines in each plot ( $C_{GC}$ =1) represent unbiased coverage. The relative coverage was normalized by the copy numbers in the sample determined by ddPCR. Data for bins with extreme GC content (those representing <0.5% of all 200 bp bins) are not shown. Color figures are available at *DNARES* online.

See this image and copyright information in PMC

Cited by

High-throughput DNA extraction and cost-effective miniaturized metagenome and amplicon library preparation of soil samples for DNA sequencing.
Jensen TBN, Dall SM, Knutsson S, Karst SM, Albertsen M. Jensen TBN, et al. PLoS One. 2024 Apr 4;19(4):e0301446. doi: 10.1371/journal.pone.0301446. eCollection 2024. PLoS One. 2024. PMID: 38573983 Free PMC article.
Emergence of carbapenem resistance in persistent Shewanella algae bacteremia: the role of pdsS G547W mutation in adaptive subpopulation dynamics.
Huang YT, Liu PY. Huang YT, et al. Ann Clin Microbiol Antimicrob. 2024 Nov 20;23(1):102. doi: 10.1186/s12941-024-00759-3. Ann Clin Microbiol Antimicrob. 2024. PMID: 39568026 Free PMC article.
Improving rigor and reproducibility in chromatin immunoprecipitation assay data analysis workflows with Rocketchip.
Haghani V, Goyal A, Zhang A, Sharifi O, Mariano N, Yasui D, Korf I, LaSalle J. Haghani V, et al. bioRxiv [Preprint]. 2024 Jul 16:2024.07.10.602975. doi: 10.1101/2024.07.10.602975. bioRxiv. 2024. PMID: 39071274 Free PMC article. Preprint.
Insights into water insecurity in Indigenous communities in Canada: assessing microbial risks and innovative solutions, a multifaceted review.
Zambrano-Alvarado JI, Uyaguari-Diaz MI. Zambrano-Alvarado JI, et al. PeerJ. 2024 Oct 18;12:e18277. doi: 10.7717/peerj.18277. eCollection 2024. PeerJ. 2024. PMID: 39434791 Free PMC article. Review.
GC Content-Associated Sequencing Bias Caused by Library Preparation Method May Infrequently Affect Salmonella Serotype Prediction Using SeqSero2.
Li S, Zhang S, Deng X. Li S, et al. Appl Environ Microbiol. 2020 Sep 1;86(18):e00614-20. doi: 10.1128/AEM.00614-20. Print 2020 Sep 1. Appl Environ Microbiol. 2020. PMID: 32680856 Free PMC article. No abstract available.

See all "Cited by" articles

References

1. Paszkiewicz K., Studholme D.J.. 2010, De novo assembly of short sequence reads, Brief. Bioinform., 11, 457–72. - PubMed
1. Goodwin S., McPherson J.D., McCombie W.R.. 2016, Coming of age: ten years of next-generation sequencing technologies, Nat. Rev. Genet., 17, 333–51. - PMC - PubMed
1. Metzker M.L. 2010, Sequencing technologies—the next generation, Nat. Rev. Genet., 11, 31–46. - PubMed
1. Head S.R., Kiyomi Komori H., LaMere S.A., et al.2014, Library construction for next-generation sequencing: overviews and challenges, Biotechniques, 56, 61–77. - PMC - PubMed
1. Nascimento F.S., Wei-Pridgeon Y., Arrowood M.J., et al.2016, Evaluation of library preparation methods for Illumina next generation sequencing of small amounts of DNA from foodborne parasites, J. Microbiol. Methods, 130, 23–6. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes

Affiliations

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous