Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 5;4(2):e00579-12.
doi: 10.1128/mBio.00579-12.

Evolutionary Genomics of Salmonella enterica Subspecies

Affiliations

Evolutionary Genomics of Salmonella enterica Subspecies

Prerak T Desai et al. mBio. .

Erratum in

  • MBio. 2013;4(2):e00198-13. Bhonagiri-Palsikar, Veena [added]; Hallsworth-Pepin, Kymberlie [added]

Abstract

ABSTRACT Six subspecies are currently recognized in Salmonella enterica. Subspecies I (subspecies enterica) is responsible for nearly all infections in humans and warm-blooded animals, while five other subspecies are isolated principally from cold-blooded animals. We sequenced 21 phylogenetically diverse strains, including two representatives from each of the previously unsequenced five subspecies and 11 diverse new strains from S. enterica subspecies enterica, to put this species into an evolutionary perspective. The phylogeny of the subspecies was partly obscured by abundant recombination events between lineages and a relatively short period of time within which subspeciation took place. Nevertheless, a variety of different tree-building methods gave congruent evolutionary tree topologies for subspeciation. A total of 285 gene families were identified that were recruited into subspecies enterica, and most of these are of unknown function. At least 2,807 gene families were identified in one or more of the other subspecies that are not found in subspecies I or Salmonella bongori. Among these gene families were 13 new candidate effectors and 7 new candidate fimbrial clusters. A third complete type III secretion system not present in subspecies enterica (I) isolates was found in both strains of subspecies salamae (II). Some gene families had complex taxonomies, such as the type VI secretion systems, which were recruited from four different lineages in five of six subspecies. Analysis of nonsynonymous-to-synonymous substitution rates indicated that the more-recently acquired regions in S. enterica are undergoing faster fixation rates than the rest of the genome. Recently acquired AT-rich regions, which often encode virulence functions, are under ongoing selection to maintain their high AT content. IMPORTANCE We have sequenced 21 new genomes which encompass the phylogenetic diversity of Salmonella, including strains of the previously unsequenced subspecies arizonae, diarizonae, houtenae, salamae, and indica as well as new diverse strains of subspecies enterica. We have deduced possible evolutionary paths traversed by this very important zoonotic pathogen and identified novel putative virulence factors that are not found in subspecies I. Gene families gained at the time of the evolution of subspecies enterica are of particular interest because they include mechanisms by which this subspecies adapted to warm-blooded hosts.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Evolution of Salmonella subspecies, as revealed by different phylogenetic tree-building algorithms. (A) Maximum likelihood cladogram. An alignment of an ~2.6-Mb core sequence conserved across all genomes (737,062 SNPs) was used, and the presented cladogram was constructed using RAxML version 2.7.6 (21). Internal nodes show bootstrap support values from 1,000 replicates. (B) Condensed and linearized maximum likelihood phylogram. Exactly 348,642 synonymous substitutions in 2,025 core genes present across all genomes in single copies were used. Distances were estimated using RAxML version 2.7.6 with 1,000 bootstraps (21), and the temporal calibration is based on the 140-million-year divergence time between E. coli and Salmonella reported previously (23). (C) Majority-rule consensus cladogram of 2,025 core phylogenetic trees constructed using Phylip version 3.69 (25). Internal nodes indicate the fractions of gene trees which support the partition of the multiple taxa at that node. (D) Condensed phylogram obtained from a Clonal Frame (27) analysis of concatenated 104,029-bp alignment. The tree represents a consensus of two replicate analyses of 200,000 Markov chain Monte Carlo iterations.
FIG 2
FIG 2
Gene accumulation enumerations and rarefaction across 29 Salmonella strains. (A) Rarefaction curves were estimated by bootstrapping 100 permutations of randomized sample order. Error bars indicate the bootstrap standard deviation (SD) based on variation in sample order among randomizations. Rarefaction curves were calculated using EstimateS (35). See the text for details. (B) Gene accumulation curve calculated based on the presence/absence gene profile. Strains were ranked in ascending distance from the Typhimurium LT2 reference genome. Core and pangenomes were estimated after addition of one strain at a time.
FIG 3
FIG 3
Gene sets gained and lost at major ancestral nodes. Only statistically overrepresented gene sets (FDR < 10%) are shown. Subspecies enterica (I) is to the left of the vertical red line. Each cell is colored based on the ratio of genes gained/lost to the total number of genes in the gene set. See Table S7 in the supplemental material for identification of genes belonging to each gene set.
FIG 4
FIG 4
Phyletic distribution of major groups of genes. Gene classes that include novel genes are shown in red. The number of genes in a gene set and the proportion of homoplastic genes within each gene set are shown on the right. The FDR (<10%) for homoplasy enrichment is in green. Serovars of subspecies enterica (I) are shown to the left of the vertical red line. Each cell is shaded based on the ratio of genes present, compared to the total number of genes in the gene set. See Table S7 in the supplemental material for identification of genes belonging to each gene set.
FIG 5
FIG 5
Mean log10 values (dS/dN) for genes recruited into the genomes at different time points. The time at which the genes were recruited into their respective genomes was estimated based on the tree in Fig. 1B.

References

    1. Chimalizeni Y, Kawaza K, Molyneux E. 2010. The epidemiology and management of non typhoidal salmonella infections. Adv. Exp. Med. Biol. 659:33–46 - PubMed
    1. Frenzen PD, Riggs TL, Buzby JC, Breuer T, Roberts T, Voetsch D, Reddy S. 1999. Salmonella cost estimate updated using FoodNet data. J. Food Saf. 22:10–15
    1. Schlundt J, Toyofuku H, Jansen J, Herbst SA. 2004. Emerging food-borne zoonoses. Rev. Sci. Tech. 23:513–533 - PubMed
    1. Beltran P, Musser JM, Helmuth R, Farmer JJ, III, Frerichs WM, Wachsmuth IK, Ferris K, McWhorter AC, Wells JG, Cravioto A, Selander RK. 1988. Toward a population genetic analysis of Salmonella: genetic diversity and relationships among strains of serotypes S. choleraesuis, S. derby, S. dublin, S. enteritidis, S. heidelberg, S. infantis, S. newport, and S. typhimurium. Proc. Natl. Acad. Sci. U. S. A. 85:7753–7757 - PMC - PubMed
    1. Selander RK, Beltran P, Smith NH, Helmuth R, Rubin FA, Kopecko DJ, Ferris K, Tall BD, Cravioto A, Musser JM. 1990. Evolutionary genetic relationships of clones of Salmonella serovars that cause human typhoid and other enteric fevers. Infect. Immun. 58:2262–2275 - PMC - PubMed