Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 21:14:1254777.
doi: 10.3389/fmicb.2023.1254777. eCollection 2023.

Evaluation of whole and core genome multilocus sequence typing allele schemes for Salmonella enterica outbreak detection in a national surveillance network, PulseNet USA

Affiliations

Evaluation of whole and core genome multilocus sequence typing allele schemes for Salmonella enterica outbreak detection in a national surveillance network, PulseNet USA

Molly M Leeper et al. Front Microbiol. .

Abstract

Salmonella enterica is a leading cause of bacterial foodborne and zoonotic illnesses in the United States. For this study, we applied four different whole genome sequencing (WGS)-based subtyping methods: high quality single-nucleotide polymorphism (hqSNP) analysis, whole genome multilocus sequence typing using either all loci [wgMLST (all loci)] and only chromosome-associated loci [wgMLST (chrom)], and core genome multilocus sequence typing (cgMLST) to a dataset of isolate sequences from 9 well-characterized Salmonella outbreaks. For each outbreak, we evaluated the genomic and epidemiologic concordance between hqSNP and allele-based methods. We first compared pairwise genomic differences using all four methods. We observed discrepancies in allele difference ranges when using wgMLST (all loci), likely caused by inflated genetic variation due to loci found on plasmids and/or other mobile genetic elements in the accessory genome. Therefore, we excluded wgMLST (all loci) results from any further comparisons in the study. Then, we created linear regression models and phylogenetic tanglegrams using the remaining three methods. K-means analysis using the silhouette method was applied to compare the ability of the three methods to partition outbreak and sporadic isolate sequences. Our results showed that pairwise hqSNP differences had high concordance with cgMLST and wgMLST (chrom) allele differences. The slopes of the regressions for hqSNP vs. allele pairwise differences were 0.58 (cgMLST) and 0.74 [wgMLST (chrom)], and the slope of the regression was 0.77 for cgMLST vs. wgMLST (chrom) pairwise differences. Tanglegrams showed high clustering concordance between methods using two statistical measures, the Baker's gamma index (BGI) and cophenetic correlation coefficient (CCC), where 9/9 (100%) of outbreaks yielded BGI values ≥ 0.60 and CCCs were ≥ 0.97 across all nine outbreaks and all three methods. K-means analysis showed separation of outbreak and sporadic isolate groups with average silhouette widths ≥ 0.87 for outbreak groups and ≥ 0.16 for sporadic groups. This study demonstrates that Salmonella isolates clustered in concordance with epidemiologic data using three WGS-based subtyping methods and supports using cgMLST as the primary method for national surveillance of Salmonella outbreak clusters.

Keywords: Salmonella; cgMLST; epidemiology; hqSNP; silhouette method; surveillance; wgMLST.

PubMed Disclaimer

Conflict of interest statement

HP was employed by BioMérieux. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Figures

FIGURE 1
FIGURE 1
PulseNet Salmonella schema development. Number of loci included within schemes are shown for core genome, core plus accessory genome [wgMLST(chrom)], plasmid (mobile), and 7-gene MLST. Loci names are provided in Supplementary Table 1.
FIGURE 2
FIGURE 2
PulseNet Salmonella allele calling workflow. Blue text indicates conditional cutoffs.
FIGURE 3
FIGURE 3
Scatterplot of Lyve-SET hqSNP differences against cgMLST (A) and wgMLST (chrom) (B) pairwise allele differences and wgMLST (chrom) against cgMLST pairwise allele differences (C). Regression equations, R2 values, and Pearson’s correlation coefficients are displayed on plots.
FIGURE 4
FIGURE 4
Baker’s gamma indices (A) and cophenetic correlation coefficients (B) for outbreak tanglegrams.
FIGURE 5
FIGURE 5
K-means analysis results for one representative outbreak (outbreak 03) using cgMLST, wgMLST (chrom), and hqSNP pairwise genomic differences. Top row: hierarchical clustering results of the dataset show partitioning of outbreak (blue) isolate sequences and sporadic (orange) isolate sequences. Bottom row: elliptical cluster plots show outbreak (blue) and sporadic (orange) isolates plotted separately on a two-dimensional plane, with ellipses fit to the points in the two clusters. On elliptical plots, the sum of values on the x and y-axis scales indicate that a principle component analysis (Ding and He, 2004) accounts for 93.0% (cgMLST), 89.8% [wgMLST(chrom)], and 92.1% (hqSNP) of variation.

References

    1. Abdel-Glil M., Chiaverini A., Garofolo G., Fasanella A., Parisi A., Harmsen D., et al. (2021). A whole-genome-based gene-by-gene typing system for standardized high-resolution strain typing of Bacillus anthracis. J. Clin. Microbiol. 59:e0288920. 10.1128/JCM.02889-20 - DOI - PMC - PubMed
    1. Achtman M., Wain J., Weill F., Nair S., Zhou Z., Sangal V., et al. (2012). Multilocus sequence typing as a replacement for serotyping in Salmonella enterica. PLoS Pathog. 8:e1002776. 10.1371/journal.ppat.1002776 - DOI - PMC - PubMed
    1. Achtman M., Zhou Z., Alikhan N., Tyne W., Parkhill J., Cormican M., et al. (2021). Genomic diversity of Salmonella enterica-The UoWUCC 10K genomes project. Wellcome Open Res. 5:223. 10.12688/wellcomeopenres.16291.2 - DOI - PMC - PubMed
    1. Alikhan N., Zhou Z., Sergeant M., Achtman M. (2018). A genomic overview of the population structure of Salmonella. PLoS Genet. 14:e1007261. 10.1371/journal.pgen.1007261 - DOI - PMC - PubMed
    1. Baker F. (1974). Stability of two hierarchical grouping techniques case 1: Sensitivity to data errors. J. Am. Stat. Assoc. 69 440–445.

LinkOut - more resources