Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 Oct 1;26(5):391-398.
doi: 10.1093/dnares/dsz017.

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes

Affiliations
Comparative Study

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes

Mitsuhiko P Sato et al. DNA Res. .

Abstract

In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.

Keywords: Illumina sequencing; bacterial genome sequencing; library preparation kits; metagenome sequencing; sequencing bias.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Quality comparison of E. coli and S. aureus genome assemblies obtained by library preparation kits. (A) Assembly statistics obtained by six library preparation kits were compared in E. coli and S. aureus. Two E. coli and two S. aureus genomes were analysed as model bacterial genomes to compare six library preparation kits. Illumina read sequences obtained from each library were assembled using Velvet and SPAdes, and the numbers of contigs and L50 values of each assembly are shown. In each sequence data set, assembly was repeated 10 times using Illumina reads randomly selected at 30× coverage. Error bars indicate standard deviations. The six kits used cover three fragmentation strategies (see the main text). XT, Nextera XT; FL, Nextera DNA Flex; KP, KAPA HyperPlus; NN, NEBNext Ultra II; QS, QIAseq FX; and TS, TruSeq nano. (B) Relative sequence coverage in relation to GC content was calculated in E. coli and S. aureus genomes obtained by three library preparation kits. Relative sequence coverage in the genome assemblies obtained by the XT, FL, and KP kits and GC content were calculated for every 200-bp window with no overlap. Only the first 120,000 bp regions of each genome are shown. (C) Relationships between GC content and sequence coverage in the E. coli and S. aureus genome assemblies obtained by six library preparation kits are shown. The relative abundance of 200 bp bins with a given GC content (defined by 0.5% interval) and the mean relative coverage of bins with a given GC content (CGC) were calculated and are shown along with GC content by black lines or lines coloured according to the library preparation kits, respectively. Black horizontal lines (CGC=1) represent unbiased coverage. The data for bins with extreme GC content (those representing <0.5% of all 200 bp bins) are not shown. Color figures are available at DNARES online.
Figure 2
Figure 2
Overall GC content-associated sequencing bias observed in 22 strains of non-S. aureus species in the genus Staphylococcus. Sequence reads were obtained from 22 strains of non-S. aureus species in the genus Staphylococcus using the XT and KP kits. The overall sequencing bias associated with GC content observed in the genome assemblies was quantified (see Materials and methods in the main text), and the relationships between the quantified overall sequencing bias and the mean GC content of each genome are shown. Solid lines indicate regression lines, and the 95% confidence intervals are indicated in grey.
Figure 3
Figure 3
Overall GC content-associated sequencing bias in the sequence data of 191 species obtained by the XT library preparation kit. Illumina sequencing data for 191 species (one strain from each species) produced using the XT kit from a project of NBRP of Japan were downloaded from the public database (DDBJ). The overall GC content-associated sequencing bias in each data set was quantified, and relationships between the quantified overall sequencing bias and the mean GC content of each genome are shown.
Figure 4
Figure 4
Metagenome sequencing of a mock bacterial community using six library preparation kits and the sequencing bias introduced by each kit. (A) Libraries of a mock bacterial community prepared by six library preparation kits were sequenced, and the relative genome abundance estimated in each data set obtained by six library preparation kits is shown. The mock community was composed of nine species with various levels of GC content. The relative abundances of each species were normalized by their genome sizes and the copy numbers of each species in the sample, which were determined by ddPCR. (B) Relationships between the GC content and sequence coverage in each genome in the mock community are shown. The mean relative coverage of each 200-bp bin with a given GC content (CGC) in each genome was calculated in each data set and is shown according to GC content by coloured lines. The colours of the lines correspond to the species shown in panel (A). Black horizontal lines in each plot (CGC=1) represent unbiased coverage. The relative coverage was normalized by the copy numbers in the sample determined by ddPCR. Data for bins with extreme GC content (those representing <0.5% of all 200 bp bins) are not shown. Color figures are available at DNARES online.

Similar articles

Cited by

References

    1. Paszkiewicz K., Studholme D.J.. 2010, De novo assembly of short sequence reads, Brief. Bioinform., 11, 457–72. - PubMed
    1. Goodwin S., McPherson J.D., McCombie W.R.. 2016, Coming of age: ten years of next-generation sequencing technologies, Nat. Rev. Genet., 17, 333–51. - PMC - PubMed
    1. Metzker M.L. 2010, Sequencing technologies—the next generation, Nat. Rev. Genet., 11, 31–46. - PubMed
    1. Head S.R., Kiyomi Komori H., LaMere S.A., et al.2014, Library construction for next-generation sequencing: overviews and challenges, Biotechniques, 56, 61–77. - PMC - PubMed
    1. Nascimento F.S., Wei-Pridgeon Y., Arrowood M.J., et al.2016, Evaluation of library preparation methods for Illumina next generation sequencing of small amounts of DNA from foodborne parasites, J. Microbiol. Methods, 130, 23–6. - PMC - PubMed

Publication types

MeSH terms