Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar;7(3):mgen000531.
doi: 10.1099/mgen.0.000531. Epub 2021 Mar 3.

Validation strategy of a bioinformatics whole genome sequencing workflow for Shiga toxin-producing Escherichia coli using a reference collection extensively characterized with conventional methods

Affiliations

Validation strategy of a bioinformatics whole genome sequencing workflow for Shiga toxin-producing Escherichia coli using a reference collection extensively characterized with conventional methods

Bert Bogaerts et al. Microb Genom. 2021 Mar.

Abstract

Whole genome sequencing (WGS) enables complete characterization of bacterial pathogenic isolates at single nucleotide resolution, making it the ultimate tool for routine surveillance and outbreak investigation. The lack of standardization, and the variation regarding bioinformatics workflows and parameters, however, complicates interoperability among (inter)national laboratories. We present a validation strategy applied to a bioinformatics workflow for Illumina data that performs complete characterization of Shiga toxin-producing Escherichia coli (STEC) isolates including antimicrobial resistance prediction, virulence gene detection, serotype prediction, plasmid replicon detection and sequence typing. The workflow supports three commonly used bioinformatics approaches for the detection of genes and alleles: alignment with blast+, kmer-based read mapping with KMA, and direct read mapping with SRST2. A collection of 131 STEC isolates collected from food and human sources, extensively characterized with conventional molecular methods, was used as a validation dataset. Using a validation strategy specifically adopted to WGS, we demonstrated high performance with repeatability, reproducibility, accuracy, precision, sensitivity and specificity above 95 % for the majority of all assays. The WGS workflow is publicly available as a 'push-button' pipeline at https://galaxy.sciensano.be. Our validation strategy and accompanying reference dataset consisting of both conventional and WGS data can be used for characterizing the performance of various bioinformatics workflows and assays, facilitating interoperability between laboratories with different WGS and bioinformatics set-ups.

Keywords: Escherichia coli; STEC, foodborne pathogens; public health; validation; whole genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Overview of the bioinformatics workflow. Each box represents a component corresponding to a series of tasks that provide a certain well-defined functionality (indicated in bold). Major bioinformatics software packages employed in each module are also mentioned (indicated in italics). Data processing steps are indicated in yellow, and bioinformatics assays are indicated in red. Data flows specific to blast+ are indicated with blue dashed lines, and data flows for KMA and SRST2 with orange dashed lines. PE, paired-end.
Fig. 2.
Fig. 2.
Overview of the characterization of the validation samples. Boxes with blue headers represent different steps in the validation. The number of samples, isolates or observations is indicated at the bottom of each box. The top part of the figure represents the collection of the validation samples from the Belgian NRC and NRL for STEC. The grey boxes group the different steps of the validation: characterization with molecular methods (‘Molecular’), whole genome sequencing (‘WGS’) and in silico characterization for assays without reference information from molecular methods (‘In silico’). All detected AMR genes with WGS were confirmed to be present with PCR. National Reference Centre (NRC), National Reference Laboratory (NRL), whole genome sequencing (WGS), ampicillin (AMP), cefotaxime (CTF). *Does not include observations from 10 negative control samples from species other than E. coli .
Fig. 3.
Fig. 3.
Minimum spanning tree containing an overview of the diversity contained within the validation dataset. The scale bar is expressed as the number of cgMLST allele differences between isolates. The annotations are (from inner to outer rings): sample name, sample origin (human or food according to the colour legend), sequence type determined with the MLST scheme of the University of Warwick using blast+-based detection, O-type and H-type as determined with PCR-based methods (absence indicates that the serotyping determining genes were not tested with PCR), presence of stx1 and stx2 as determined with PCR-based methods (a blue circle denotes presence), the number of virulence genes from the set of 20 virulence genes other than stx1 and stx2 that were detected with PCR-based methods, the number of AMR genes that were detected with blast+ and confirmed with PCR, and the number of detected plasmid replicons by the reference standard (PlasmidFinder). The number of AMR genes, virulence genes and plasmid replicons are indicated according to the colour legend. Antimicrobial resistance (AMR). Full detailed information on the metadata for the characteristics of the validation dataset is available in the Supplementary Material. *O-types for samples EH1873 and EH1389 were abbreviated to ‘O*‘ from ‘O17/43/44/77/106’ and ‘O90/127’, respectively; H-type for sample TIAC1419 was H21/H7.

Similar articles

Cited by

References

    1. Allard MW, Bell R, Ferreira CM, Gonzalez-Escalona N, Hoffmann M, et al. Genomics of foodborne pathogens for microbial food safety. Curr Opin Biotechnol. 2018;49:224–229. doi: 10.1016/j.copbio.2017.11.002. - DOI - PubMed
    1. Lindsey RL, Pouseele H, Chen JC, Strockbine NA, Carleton HA. Implementation of whole genome sequencing (WGS) for identification and characterization of Shiga toxin-producing Escherichia coli (STEC) in the United States. Front Microbiol. 2016;7:1–9. - PMC - PubMed
    1. Carriço JA, Sabat AJ, Friedrich AW, Ramirez M. Bioinformatics in bacterial molecular epidemiology and public health: databases, tools and the next-generation sequencing revolution, on behalf of the ESCMID Study Group for Epidemiological Markers (ESGEM) Eurosurveillance. 2013;18:1–9. - PubMed
    1. Dallman TJ, Byrne L, Ashton PM, Cowley LA, Perry NT, et al. Whole-genome sequencing for national surveillance of Shiga toxin-producing Escherichia coli O157. Clin Infect Dis. 2015;61:305–312. doi: 10.1093/cid/civ318. - DOI - PMC - PubMed
    1. Gilmour MW, Graham M, Reimer A, Van Domselaar G. Public health genomics and the new molecular epidemiology of bacterial pathogens. Public Health Genomics. 2013;16:25–30. doi: 10.1159/000342709. - DOI - PubMed

Publication types

MeSH terms