. 2020 Sep 4;10(1):14649.

doi: 10.1038/s41598-020-71207-3.

Impact of DNA extraction on whole genome sequencing analysis for characterization and relatedness of Shiga toxin-producing Escherichia coli isolates

Stéphanie Nouws^#^{1

2}, Bert Bogaerts^#^{1

2}, Bavo Verhaegen³, Sarah Denayer³, Denis Piérard⁴, Kathleen Marchal^{2

5

6}, Nancy H C Roosens¹, Kevin Vanneste^#¹, Sigrid C J De Keersmaecker^#⁷

Affiliations

¹ Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium.
² Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium.
³ National Reference Laboratory for Shiga Toxin-Producing Escherichia coli (NRL-STEC), Foodborne Pathogens, Sciensano, Brussels, Belgium.
⁴ Department of Microbiology and Infection Control, National Reference Center for Shiga Toxin-Producing Escherichia coli (NRC-STEC), Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Brussels, Belgium.
⁵ Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
⁶ Department of Genetics, University of Pretoria, Pretoria, South Africa.
⁷ Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium. sigrid.dekeersmaecker@sciensano.be.

^# Contributed equally.

PMID: 32887913
PMCID: PMC7474065
DOI: 10.1038/s41598-020-71207-3

Impact of DNA extraction on whole genome sequencing analysis for characterization and relatedness of Shiga toxin-producing Escherichia coli isolates

Stéphanie Nouws et al. Sci Rep. 2020.

. 2020 Sep 4;10(1):14649.

doi: 10.1038/s41598-020-71207-3.

Authors

Affiliations

¹ Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium.
² Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium.
³ National Reference Laboratory for Shiga Toxin-Producing Escherichia coli (NRL-STEC), Foodborne Pathogens, Sciensano, Brussels, Belgium.
⁴ Department of Microbiology and Infection Control, National Reference Center for Shiga Toxin-Producing Escherichia coli (NRC-STEC), Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Brussels, Belgium.
⁵ Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
⁶ Department of Genetics, University of Pretoria, Pretoria, South Africa.
⁷ Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium. sigrid.dekeersmaecker@sciensano.be.

^# Contributed equally.

PMID: 32887913
PMCID: PMC7474065
DOI: 10.1038/s41598-020-71207-3

Abstract

Whole genome sequencing (WGS) has proven to be the ultimate tool for bacterial isolate characterization and relatedness determination. However, standardized and harmonized workflows, e.g. for DNA extraction, are required to ensure robust and exchangeable WGS data. Data sharing between (inter)national laboratories is essential to support foodborne pathogen control, including outbreak investigation. This study evaluated eight commercial DNA preparation kits for their potential influence on: (i) DNA quality for Nextera XT library preparation; (ii) MiSeq sequencing (data quality, read mapping against plasmid and chromosome references); and (iii) WGS data analysis, i.e. isolate characterization (serotyping, virulence and antimicrobial resistance genotyping) and phylogenetic relatedness (core genome multilocus sequence typing and single nucleotide polymorphism analysis). Shiga toxin-producing Escherichia coli (STEC) was selected as a case study. Overall, data quality and inferred phylogenetic relationships between isolates were not affected by the DNA extraction kit choice, irrespective of the presence of confounding factors such as EDTA in DNA solution buffers. Nevertheless, completeness of STEC characterization was, although not substantially, influenced by the plasmid extraction performance of the kits, especially when using Nextera XT library preparation. This study contributes to addressing the WGS challenges of standardizing protocols to support data portability and to enable full exploitation of its potential.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Overview of median mapping depths against the Sakai *E. coli* O157:H7 reference genome and the Sakai *E. coli* pO157 plasmid for sequencing run replicates TIAC1151 and TIAC1165 per kit, sequenced in run 2. The median read mapping depth for each sample was calculated using a sliding window of 10,000 bases shifted by 5,000 bases for each data point. Abbreviations: DNeasy Blood & Tissue kit (DNeasy B & T), DNeasy UltraClean Microbial kit (DNeasy UltraClean), Easy-DNA gDNA Purification kit (Easy-DNA), GenElute Bacterial gDNA kit (GenElute), Genomic-tip 20/G kit (gTip 20), MasterPure Complete DNA Purification kit (MasterPure), NucliSENS miniMag (NucliSens), Wizard gDNA Purification kit (Wizard).

**Figure 2**
Overview of the virulence genotype obtained for all samples. Presence and absence of virulence genes are indicated in green and red, respectively, as determined using BLAST + and SRST2. *Virulence genes detected only with SRST2; —Missed virulence genes, referred to as false negatives (neither detected with SRST2 nor BLAST + while presence of the gene was expected, i.e. detected in the same isolate processed with a different kit, or detected in a sequencing run replicate of the isolate).

**Figure 3**
Average number of reads mapping uniquely to the Sakai *E. coli* pO157 plasmid reference normalized per one million trimmed input reads for the eight kits. Number of reads mapping uniquely against the Sakai *E. coli* pO157 plasmid reference per million input reads when mapping simultaneously against the Sakai *E. coli* pO157 plasmid (NC_002128.1) and Sakai *E. coli* O157:H7 genome (NC_002695.2) reference. Values are averaged over all *E. coli* O157:H7 samples (TIAC1151, TIAC1152, TIAC1153, TIAC1165, TIAC1169 and TIAC1638) that were generated with each kit, without inclusion of the sequencing run replicate results for TIAC1151 and TIAC1165. Bars represent the standard deviation across samples for each kit. Significant differences in average plasmid reads per million trimmed input reads were identified with the Kruskal–Wallis test (n: 48, α: 0.05, p-value: 2.80 × 10^–7) followed by Dunn post-hoc analysis with Holm correction, as depicted in the accompanying table with significant values depicted in bold.

**Figure 4**
cgMLST-based tree of all samples. A minimum spanning tree was created with GrapeTree using the MSTreeV2 method on all outbreak and non-outbreak samples generated with the eight kits, excluding sequencing run replicates. All outbreak samples (TIAC1151, TIAC1152, TIAC1165 and TIAC1169) consistently cluster together, while non-outbreak samples TIAC1153, TIAC1638 and TIAC1660 are separated from the outbreak cluster and delineated per isolate. The scale bar represents the number of cgMLST allele differences between samples. One cgMLST allele difference with other outbreak samples was observed for only four samples (TIAC1152 generated with the Genomic-tip 20/G, TIAC1152 generated with the DNeasy UltraClean Microbial kit, TIAC1165 generated with the DNeasy UltraClean Microbial kit, and TIAC1153 generated with the Easy-DNA gDNA Purification kit), which is not visible in the figure, because of the large scale.

**Figure 5**
SNP-based tree of all O157:H7 samples. A maximum likelihood SNP tree was generated using the K2 nucleotide substitution model, containing all O157:H7 samples. Non-O157:H7 samples (TIAC1660) were excluded from SNP calling, due to high divergence from the Sakai *E. coli* O157:H7 reference genome. All outbreak samples (TIAC1151, TIAC1152, TIAC1165 and TIAC1169) consistently clustered together, irrespective of the employed kit. Within the outbreak clade, for all TIAC1165 samples, a limited number of discrepant SNPs with other outbreak samples existed, largely confined to a hypothetical transposase region (*ydcC* gene). The non-outbreak samples (TIAC1153 and TIAC1638) were separated from the outbreak clade, and clustered together per isolate. Notably, for TIAC1153 samples, a small number of SNPs different with the reference genome between the sample generated with the MasterPure Complete DNA Purification kit and all other TIAC1153 samples, was observed. This difference was solely due to masking of a low-quality region (see “Results”). The distance scale bar represents the average number of nucleotide substitutions per site.

See this image and copyright information in PMC

References

1. García Fierro, R. et al. Outcome of EC/EFSA questionnaire (2016) on use of whole genome sequencing (WGS) for food- and waterborne pathogens isolated from animals, food, feed and related environmental samples in EU/EFTA countries. EFSA J.15, 2018. 10.2903/sp.efsa.2018.EN-1432 (2016).
1. ECDC. Monitoring the Use of Whole-Genome Sequencing in Infectious Disease Surveillance in Europe. (2018). 10.2900/037665.
1. Revez, J., Espinosa, L., Albiger, B., Leitmeyer, K. C. & Struelens, M. J. Survey on the use of whole-genome sequencing for infectious diseases surveillance: Rapid expansion of European National Capacities, 2015–2016. Front. Public Heal.5, 347, 10.3389/fpubh.2017.00347 (2017). - PMC - PubMed
1. Nouws, S. et al. The benefits of whole genome sequencing for foodborne outbreak investigation from the perspective of a National Reference Laboratory in a smaller country. Foods9, 1030; 10.3390/foods9081030 (2020). - PMC - PubMed
1. Allard MW, et al. Practical value of food pathogen traceability through building a whole-genome sequencing network and database. J. Clin. Microbiol. 2016;54:1975–1983. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Impact of DNA extraction on whole genome sequencing analysis for characterization and relatedness of Shiga toxin-producing Escherichia coli isolates

Affiliations

Impact of DNA extraction on whole genome sequencing analysis for characterization and relatedness of Shiga toxin-producing Escherichia coli isolates

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical