. 2017 Jul 31:8:1345.

doi: 10.3389/fmicb.2017.01345. eCollection 2017.

Pan-genome Analyses of the Species Salmonella enterica, and Identification of Genomic Markers Predictive for Species, Subspecies, and Serovar

Chad R Laing¹, Matthew D Whiteside¹, Victor P J Gannon¹

Affiliations

PMID: 28824552
PMCID: PMC5534482
DOI: 10.3389/fmicb.2017.01345

Pan-genome Analyses of the Species Salmonella enterica, and Identification of Genomic Markers Predictive for Species, Subspecies, and Serovar

Chad R Laing et al. Front Microbiol. 2017.

. 2017 Jul 31:8:1345.

doi: 10.3389/fmicb.2017.01345. eCollection 2017.

Authors

Chad R Laing¹, Matthew D Whiteside¹, Victor P J Gannon¹

Affiliation

¹ National Microbiology Laboratory, Public Health Agency of CanadaLethbridge, AB, Canada.

PMID: 28824552
PMCID: PMC5534482
DOI: 10.3389/fmicb.2017.01345

Abstract

Food safety is a global concern, with upward of 2.2 million deaths due to enteric disease every year. Current whole-genome sequencing platforms allow routine sequencing of enteric pathogens for surveillance, and during outbreaks; however, a remaining challenge is the identification of genomic markers that are predictive of strain groups that pose the most significant health threats to humans, or that can persist in specific environments. We have previously developed the software program Panseq, which identifies the pan-genome among a group of sequences, and the SuperPhy platform, which utilizes this pan-genome information to identify biomarkers that are predictive of groups of bacterial strains. In this study, we examined the pan-genome of 4893 genomes of Salmonella enterica, an enteric pathogen responsible for the loss of more disability adjusted life years than any other enteric pathogen. We identified a pan-genome of 25.3 Mbp, a strict core of 1.5 Mbp present in all genomes, and a conserved core of 3.2 Mbp found in at least 96% of these genomes. We also identified 404 genomic regions of 1000 bp that were specific to the species S. enterica. These species-specific regions were found to encode mostly hypothetical proteins, effectors, and other proteins related to virulence. For each of the six S. enterica subspecies, markers unique to each were identified. No serovar had pan-genome regions that were present in all of its genomes and absent in all other serovars; however, each serovar did have genomic regions that were universally present among all constituent members, and statistically predictive of the serovar. The phylogeny based on SNPs within the conserved core genome was found to be highly concordant to that produced by a phylogeny using the presence/absence of 1000 bp regions of the entire pan-genome. Future studies could use these predictive regions as components of a vaccine to prevent salmonellosis, as well as in simple and rapid diagnostic tests for both in silico and wet-lab applications, with uses ranging from food safety to public health. Lastly, the tools and methods described in this study could be applied as a pan-genomics framework to other population genomic studies seeking to identify markers for other bacterial species and their sub-groups.

Keywords: Salmonella; food safety; genomics; pan-genome; predictive markers.

PubMed Disclaimer

Figures

**FIGURE 1**
The distribution of the *Salmonella enterica* pan-genome, as 1000 bp fragments, among 4939 whole-genome sequences (WGSs).

**FIGURE 2**
The carriage of the 404 *S. enterica* species-specific regions among each of the 4939 genomes of this study. Each dot represents a single *S. enterica* genome, which are arranged in order from those that contain the fewest species-specific regions to those that contain the most.

**FIGURE 3**
The carriage of the 404 *S. enterica* species-specific regions, versus the number of contigs for each of the 4936 genomes. Colors indicate the subspecies within *S. enterica* as follows: red: arizonae, lime: diarizonae, teal: enterica, blue: houtenae, lavender: indica, magenta: salamae and yellow: sample with *Citrobacter* contamination.

**FIGURE 4**
The phylogeny of the 4893 *S. enterica* genomes post quality-filtering, and limiting the number of genomes from each serovar to five. The name of each serovar is presented as text, and the six subspecies are shown as colored circles as follows: teal: arizonae, blue: diarizonae, dark orange: enterica, peach: houtenae, dark green: indica, light orange: salamae.

**FIGURE 5**
The phylogeny of the 4893 *S. enterica* genomes post quality-filtering based on SNPs found within the conserved core genome. The 10 most abundant serovars of subspecies enterica in the current study (Agona, Bareilly, Enteritidis, Heidelberg, Kentucky, Newport, Paratyphi, Typhi, Typhimurium, Weltevreden) are labeled on the tree. The matrix to the right of the phylogeny represents the 404 species-specific regions, with blue being the absence of a region, and green being the presence of a region, for each of the genomes of the study.

**FIGURE 6**
The phylogeny of the 4893 *S. enterica* genomes post quality-filtering based on the presence/absence of the entire pan-genome as 1000 bp fragments. The 10 most abundant serovars of subspecies enterica in the current study (Agona, Bareilly, Enteritidis, Heidelberg, Kentucky, Newport, Paratyphi, Typhi, Typhimurium, Weltevreden) are labeled on the tree. The matrix to the right of the phylogeny represents the 404 species-specific regions, with blue being the absence of a region, and green being the presence of a region, for each of the genomes of the study.

**FIGURE 7**
The number of predictive markers from the GenBank dataset found within the EnteroBase dataset for nine serovars of *S. enterica*, which encompassed a test set of 3948 genomes. The number of genomes for each serovar was the same between the GenBank and EnteroBase datasets, as shown in **Table 6**. The size of the circles is proportional to the number of predictive markers from the GenBank dataset found in the EnteroBase dataset. The number of genomes for each serovar is given in the horizontal axis label. Using serovar Agona as an example, there were 136 genomes in both the GenBank and EnteroBase datasets, and 129 of the 161 predictive markers from the GenBank dataset were found in all of the genomes from the EnteroBase dataset, whereas 21 of the GenBank predictive markers were found in all but one (135) of the EnteroBase genomes examined.

See this image and copyright information in PMC

Cited by

Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data.
Uelze L, Borowiak M, Deneke C, Szabó I, Fischer J, Tausch SH, Malorny B. Uelze L, et al. Appl Environ Microbiol. 2020 Feb 18;86(5):e02265-19. doi: 10.1128/AEM.02265-19. Print 2020 Feb 18. Appl Environ Microbiol. 2020. PMID: 31862714 Free PMC article.
Large-Scale Genomics Reveals the Genetic Characteristics of Seven Species and Importance of Phylogenetic Distance for Estimating Pan-Genome Size.
Park SC, Lee K, Kim YO, Won S, Chun J. Park SC, et al. Front Microbiol. 2019 Apr 24;10:834. doi: 10.3389/fmicb.2019.00834. eCollection 2019. Front Microbiol. 2019. PMID: 31068915 Free PMC article.
Comparative Genomic Analyses and CRISPR-Cas Characterization of Cutibacterium acnes Provide Insights Into Genetic Diversity and Typing Applications.
Cobian N, Garlet A, Hidalgo-Cantabrana C, Barrangou R. Cobian N, et al. Front Microbiol. 2021 Nov 3;12:758749. doi: 10.3389/fmicb.2021.758749. eCollection 2021. Front Microbiol. 2021. PMID: 34803983 Free PMC article.
Mobile genetic elements define the non-random structure of the Salmonella enterica serovar Typhi pangenome.
Peñil-Celis A, Tagg KA, Webb HE, Redondo-Salvo S, Francois Watkins L, Vielva L, Griffin C, Kim JY, Folster JP, Garcillan-Barcia MP, de la Cruz F. Peñil-Celis A, et al. mSystems. 2024 Aug 20;9(8):e0036524. doi: 10.1128/msystems.00365-24. Epub 2024 Jul 26. mSystems. 2024. PMID: 39058093 Free PMC article.
Recent emergence of cephalosporin-resistant Salmonella Typhi in India due to the endemic clone acquiring IncFIB(K) plasmid encoding bla_CTX-M-15 gene.
Thirumoorthy TP, Jacob JJ, Velmurugan A, Teekaraman MP, Shah B, Iyer V, Maheshwari G, Trivedi U, Shah A, Patel P, Gaigawale A, M Y, Sathya Narayanan P, Mutreja A, Carey M, John J, Kang G, Veeraraghavan B. Thirumoorthy TP, et al. Microbiol Spectr. 2025 Apr 10;13(5):e0087524. doi: 10.1128/spectrum.00875-24. Online ahead of print. Microbiol Spectr. 2025. PMID: 40208005 Free PMC article.

See all "Cited by" articles

References

1. Aanensen D. M., Feil E. J., Holden M. T. G., Dordel J., Yeats C. A., Fedosejev A., et al. (2016). Whole-genome sequencing for routine pathogen surveillance in public health: a population snapshot of invasive Staphylococcus aureus in Europe. mBio 7:e00444–16 10.1128/mBio.00444-16 - DOI - PMC - PubMed
1. Allard M. W., Luo Y., Strain E., Li C., Keys C. E., Son I., et al. (2012). High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach. BMC Genomics 13:32 10.1186/1471-2164-13-32 - DOI - PMC - PubMed
1. Ashton P. M., Nair S., Peters T. M., Bale J. A., Powell D. G., Painset A., et al. (2016). Identification of Salmonella for public health surveillance using whole genome sequencing. PeerJ 4:e1752 10.7717/peerj.1752 - DOI - PMC - PubMed
1. Babenko D., Azizov I., Toleman M. (2016). wgMLST as a standardized tool for assessing the quality of genome assembly data. Int. J. Infect. Dis. 45:329 10.1016/j.ijid.2016.02.714 - DOI
1. Bergholz T. M., Moreno Switt A. I., Wiedmann M. (2014). Omics approaches in food safety: Fulfilling the promise? Trends Microbiol. 22 275–281. 10.1016/j.tim.2014.01.006 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pan-genome Analyses of the Species Salmonella enterica, and Identification of Genomic Markers Predictive for Species, Subspecies, and Serovar

Affiliation

Pan-genome Analyses of the Species Salmonella enterica, and Identification of Genomic Markers Predictive for Species, Subspecies, and Serovar

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous