Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Apr 5;14(4):e1007261.
doi: 10.1371/journal.pgen.1007261. eCollection 2018 Apr.

A genomic overview of the population structure of Salmonella

Affiliations
Review

A genomic overview of the population structure of Salmonella

Nabil-Fareed Alikhan et al. PLoS Genet. .

Abstract

For many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs) based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST]) corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria. Here, we describe EnteroBase, a web-based platform that assembles draft genomes from Illumina short reads in the public domain or that are uploaded by users. EnteroBase implements legacy MLST as well as ribosomal gene MLST (rMLST), core genome MLST (cgMLST), and whole genome MLST (wgMLST) and currently contains over 100,000 assembled genomes from Salmonella. It also provides graphical tools for visual interrogation of these genotypes and those based on core single nucleotide polymorphisms (SNPs). eBGs based on legacy MLST are largely consistent with eBGs based on rMLST, thus demonstrating that these correspond to natural populations. rMLST also facilitated the selection of representative genotypes for SNP analyses of the entire breadth of diversity within Salmonella. In contrast, cgMLST provides the resolution needed for epidemiological investigations. These observations show that genomic genotyping, with the assistance of EnteroBase, can be applied at all levels of diversity within the Salmonella genus.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Legacy MLST STs and numbers of entries (records) in EnteroBase (http://enterobase.warwick.ac.uk).
EnteroBase has performed genomic assemblies from sequence reads that were originally submitted to ENA short-read archives or directly uploaded to EnteroBase by users. EnteroBase also contains all entries with legacy MLST genotypes based on ABI sequencing that were originally submitted to the now defunct legacy MLST website. (A) Historical release dates in EnteroBase for assembled Salmonella genomes and strains subjected to legacy ABI-based MLST. The curves indicate an exponential increase in the numbers of publicly released short reads over time versus only a linear increase in legacy entries. Genomes with a release date after 1 November 2017 were excluded, as were 3,012 sets of short reads that showed evidence of contamination or did not pass quality control after assembly. (B) A total of 3,929 STs were defined from 120,471 strains of Salmonella by ABI sequencing (left), by WGS (right), or by both approaches (overlap). A full list of the strains summarised in this figure is publicly available to registered EnteroBase users in the EnteroBase workspace at https://enterobase.warwick.ac.uk/species/senterica/search_strains?query=workspace:9113. Note that the number of genomes summarised here exceeds those in part (A) because we evaluated all genomes that had been assembled by 1 November 2017, including assemblies that had not yet been released, with the exception of 795 genomic assemblies in which the alleles for all seven legacy MLST loci had not been called. ABI, Applied Biosystems; ENA, European Nucleotide Archive; WGS, whole genome sequencing. For other acronyms see Box 1.
Fig 2
Fig 2. Correspondence between eBGs from legacy MLST and reBGs from rMLST.
The figure shows a GrapeTree clustering (MSTreeV2) [31] of 3,929 MLST STs from 118,391 Salmonella strains in EnteroBase. Each node corresponds to a single legacy ST, with diameter scaled to the number of strains. Strains whose ST assignments were based on legacy ABI sequencing are coloured black and strains whose ST assignments were based on genomic assemblies are indicated by unique colours for the 50 most prevalent reBGs. The colours, reBG designations, dominant serovar, and numbers of genomic assemblies are indicated in the key (top left). Lines connect nodes that are single-locus variants. The GrapeTree shows that most reBGs correspond to a single eBG cluster of single-locus variants of legacy STs (e.g., eBG1 corresponds to reBG1), but others correspond to subclusters (e.g., eBG14 includes reBG14.1, reBG14.2, and reBG14.3). This correspondence between eBG and reBG assignments was 1:1 for 243 eBGs/reBGs. In other consistent cases, 24 eBGs split into multiple related reBGs and 13 reBGs each encompass multiple related eBGs. An interactive version of the figure is available to registered EnteroBase users at http://enterobase.warwick.ac.uk/ms_tree?tree_id=9123. A full list of strains included is available on EnteroBase (https://enterobase.warwick.ac.uk/species/senterica/search_strains?query=workspace:9113).
Fig 3
Fig 3. Grapetree [31] representation of a maximum-likelihood phylogeny of core SNPs from 926 representative genomes of S. enterica plus S. bongori.
The dataset includes one representative genome from each reBG in S. enterica subspecies enterica and one representative genome from each rST in the other subspecies and S. bongori. Branches less than 0.001 substitutions per site were collapsed for clarity, whereas the branch to S. bongori (dotted line) was arbitrarily shortened to 0.4. Nodes at the tips of branches were coloured by subspecies/species, as indicated in the key. The tree indicates the likely existence of at least three genetically distinct, novel subspecies, labelled novel subsp. A through novel subsp. C. Scale bar at 02:00 in substitutions per site. A full list of strains containing the inferred species and subspecies according to this figure is available at https://enterobase.warwick.ac.uk/species/senterica/search_strains?query=workspace:9641.
Fig 4
Fig 4. Genetic relationships according to core SNPs, cgMLST and wgMLST.
(A) Maximum parsimony genealogy of 73 genomes of serovar Agona based on 846 nonhomoplastic, nonrecombinant, nonmobile, nonrepetitive core SNPs. Modified from Fig 1 of Zhou et al. [20]. Groups A and D (key) refer to the subgroups labelled A2 and A1, and D1 and D2, respectively, in that figure. An interactive version of this phylogram is available to EnteroBase users (http://enterobase.warwick.ac.uk/phylo_tree?tree_id=10290). (B) GrapeTree [31] of cgMLST (3,002 loci) from 1,082 Agona genomes in EnteroBase, consisting of all Agona genomes assembled by EnteroBase from the ENA short-read archives, including the genomes in part A plus additional genomes from the Irish 2008 outbreak. An interactive version of this tree is available at http://enterobase.warwick.ac.uk/ms_tree?tree_id=9946. The entire set of genomes and all its metadata and genotyping results are available to registered EnteroBase users at http://enterobase.warwick.ac.uk/species/senterica/search_strains?query=workspace:12810. Scale bar for 5 loci at right. (C) GrapeTree of wgMLST (21,065 loci) of the same genomes as in part B. Scale bar for 30 loci at the right. An interactive version of this tree is available at http://enterobase.warwick.ac.uk/ms_tree?tree_id=9947. Parts B and C are colour coded by country of isolation. Additional metadata such as year of isolation or source of isolation can be investigated in the interactive versions.

References

    1. Gwyn LB (1898) On infection with a Para-Colin bacillus in a case with all the clinical features of typhoid fever. Johns Hopkins Hospital Bulletin 84: 54–56.
    1. Grimont P. A. and Weill F.-X. (2007) Antigenic formulae of the Salmonella serovars Paris, France: WHO Collaborating Centre for Reference and Research on Salmonella. 166 p.
    1. Achtman M, Wain J, Weill F-X, Nair S, Zhou Z, Sangal V, et al. (2012) Multilocus sequence typing as a replacement for serotyping in Salmonella enterica. PLoS Pathog 8: e1002776 doi: 10.1371/journal.ppat.1002776 - DOI - PMC - PubMed
    1. Maiden MCJ, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, et al. (1998) Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95: 3140–3145. - PMC - PubMed
    1. Kidgell C, Reichard U, Wain J, Linz B, Torpdahl M, Dougan G, et al. (2002) Salmonella typhi, the causative agent of typhoid fever, is approximately 50,000 years old. Infect Genet Evol 2: 39–45. - PubMed

Publication types