Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May;9(5):mgen001012.
doi: 10.1099/mgen.0.001012.

Evaluation of core genome and whole genome multilocus sequence typing schemes for Campylobacter jejuni and Campylobacter coli outbreak detection in the USA

Affiliations

Evaluation of core genome and whole genome multilocus sequence typing schemes for Campylobacter jejuni and Campylobacter coli outbreak detection in the USA

Lavin A Joseph et al. Microb Genom. 2023 May.

Abstract

Campylobacter is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Pulsed-field gene electrophoresis (PFGE) and 7-gene multilocus sequence typing (MLST) have been historically used to differentiate sporadic from outbreak Campylobacter isolates. Whole genome sequencing (WGS) has been shown to provide superior resolution and concordance with epidemiological data when compared with PFGE and 7-gene MLST during outbreak investigations. In this study, we evaluated epidemiological concordance for high-quality SNP (hqSNP), core genome (cg)MLST and whole genome (wg)MLST to cluster or differentiate outbreak-associated and sporadic Campylobacter jejuni and Campylobacter coli isolates. Phylogenetic hqSNP, cgMLST and wgMLST analyses were also compared using Baker's gamma index (BGI) and cophenetic correlation coefficients. Pairwise distances comparing all three analysis methods were compared using linear regression models. Our results showed that 68/73 sporadic C. jejuni and C. coli isolates were differentiated from outbreak-associated isolates using all three methods. There was a high correlation between cgMLST and wgMLST analyses of the isolates; the BGI, cophenetic correlation coefficient, linear regression model R 2 and Pearson correlation coefficients were >0.90. The correlation was sometimes lower comparing hqSNP analysis to the MLST-based methods; the linear regression model R 2 and Pearson correlation coefficients were between 0.60 and 0.86, and the BGI and cophenetic correlation coefficient were between 0.63 and 0.86 for some outbreak isolates. We demonstrated that C. jejuni and C. coli isolates clustered in concordance with epidemiological data using WGS-based analysis methods. Discrepancies between allele and SNP-based approaches may reflect the differences between how genomic variation (SNPs and indels) are captured between the two methods. Since cgMLST examines allele differences in genes that are common in most isolates being compared, it is well suited to surveillance: searching large genomic databases for similar isolates is easily and efficiently done using allelic profiles. On the other hand, use of an hqSNP approach is much more computer intensive and not scalable to large sets of genomes. If further resolution between potential outbreak isolates is needed, wgMLST or hqSNP analysis can be used.

Keywords: Campylobacter; cgMLST; hqSNP; outbreak; wgMLST.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
PulseNet Campylobacter allele calling workflow. Sequences must pass the quality thresholds in red, otherwise they are rejected. The metrics in blue are evaluated for quality assessment of sequences but are not used to reject sequences. The parameters in black are settings used in BioNumerics. CDS, coding sequence.
Fig. 2.
Fig. 2.
PulseNet Campylobacter schema development. Numbers of loci included within schemes are shown for the overall scheme, core genome, accessory genome (excluding core) and 7-gene MLST.
Fig. 3.
Fig. 3.
Sequence quality metrics and corresponding cgMLST allele calls (percent core called, a–f) or wgMLST allele calls (present alleles, g–l) for all 336 sequenced Campylobacter isolates. An additional 15 sporadic isolates with sequence length <1.4 Mb and an additional six sporadic isolates with an average de novo coverage <20× were added to the respective datasets. C. jejuni and C. coli isolate sequences are coloured according to the key. PulseNet quality metric thresholds are indicated by the red lines.
Fig. 3.
Fig. 3.
Sequence quality metrics and corresponding cgMLST allele calls (percent core called, a–f) or wgMLST allele calls (present alleles, g–l) for all 336 sequenced Campylobacter isolates. An additional 15 sporadic isolates with sequence length <1.4 Mb and an additional six sporadic isolates with an average de novo coverage <20× were added to the respective datasets. C. jejuni and C. coli isolate sequences are coloured according to the key. PulseNet quality metric thresholds are indicated by the red lines.
Fig. 4.
Fig. 4.
UPGMA phylogenic dendrograms with outbreak (237 C. jejuni and five C. coli ) and sporadic (69 C. jejuni and four C. coli ) sequences based on cgMLST (a) and wgMLST (b) schemes in BioNumerics v7.6.3. Dendrograms were circularized and annotated using IToL. Inner circle represents species of isolates, second ring represents source for each isolate, and the last ring represents the outbreak investigation each isolate belongs to or the sporadic isolates. Colour codes for species, source, and outbreak or sporadic isolates are shown in the key.
Fig. 5.
Fig. 5.
Bar graphs of Baker’s gamma index (a) and cophenetic correlation coefficient (b) for all outbreak+sporadic isolate datasets in this study. Comparisons of wgMLST and cgMLST dendrograms are in orange, comparisons of wgMLST and hqSNP dendrograms are in blue, and comparisons of cgMLST and hqSNP dendrograms are in silver. Values close to 1 indicate that the two analysis methods compared are more similar.
Fig. 6.
Fig. 6.
Scatterplot of all outbreak-related pairwise differences of cgMLST and wgMLST against hqSNP (a, c, e) and cgMLST against wgMLST (b, d, f). Scatterplots for all within-clade outbreak isolates are shown in (a) and (b), excluding puppy outbreak isolates (1708FLDBR-1 and 1906NVDBR-1) are shown in (c) and (d), and only puppy outbreak isolates are shown in (e) and (f). A simple linear regression analysis was performed for each comparison to produce a best-fit line; the formulas (y=mx+b) and variabilities ( r 2 ) are shown on the figure. Pearson correlation coefficients were also calculated and are displayed on the figure.

Similar articles

Cited by

References

    1. Collier SA, Deng L, Adam EA, Benedict KM, Beshearse EM, et al. Estimate of burden and direct healthcare cost of infectious waterborne disease in the United States. Emerg Infect Dis. 2021;27:140–149. doi: 10.3201/eid2701.190676. - DOI - PMC - PubMed
    1. Kirkpatrick BD, Tribble DR. Update on human Campylobacter jejuni infections. Curr Opin Gastroenterol. 2011;27:1–7. doi: 10.1097/MOG.0b013e3283413763. - DOI - PubMed
    1. Allos BM. Campylobacter jejuni infections: update on emerging issues and trends. Clin Infect Dis. 2001;32:1201–1206. doi: 10.1086/319760. - DOI - PubMed
    1. Butzler J-P. Campylobacter, from obscurity to celebrity. Clin Microbiol Infect. 2004;10:868–876. doi: 10.1111/j.1469-0691.2004.00983.x. - DOI - PubMed
    1. Smith JL, Bayles D. Postinfectious irritable bowel syndrome: a long-term consequence of bacterial gastroenteritis. J Food Prot. 2007;70:1762–1769. doi: 10.4315/0362-028x-70.7.1762. - DOI - PubMed

LinkOut - more resources