Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 27;17(9):e1009820.
doi: 10.1371/journal.pgen.1009820. eCollection 2021 Sep.

Genomic population structure associated with repeated escape of Salmonella enterica ATCC14028s from the laboratory into nature

Affiliations

Genomic population structure associated with repeated escape of Salmonella enterica ATCC14028s from the laboratory into nature

Mark Achtman et al. PLoS Genet. .

Abstract

Salmonella enterica serovar Typhimurium strain ATCC14028s is commercially available from multiple national type culture collections, and has been widely used since 1960 for quality control of growth media and experiments on fitness ("laboratory evolution"). ATCC14028s has been implicated in multiple cross-contaminations in the laboratory, and has also caused multiple laboratory infections and one known attempt at bioterrorism. According to hierarchical clustering of 3002 core gene sequences, ATCC14028s belongs to HierCC cluster HC20_373 in which most internal branch lengths are only one to three SNPs long. Many natural Typhimurium isolates from humans, domesticated animals and the environment also belong to HC20_373, and their core genomes are almost indistinguishable from those of laboratory strains. These natural isolates have infected humans in Ireland and Taiwan for decades, and are common in the British Isles as well as the Americas. The isolation history of some of the natural isolates confirms the conclusion that they do not represent recent contamination by the laboratory strain, and 10% carry plasmids or bacteriophages which have been acquired in nature by HGT from unrelated bacteria. We propose that ATCC14028s has repeatedly escaped from the laboratory environment into nature via laboratory accidents or infections, but the escaped micro-lineages have only a limited life span. As a result, there is a genetic gap separating HC20_373 from its closest natural relatives due to a divergence between them in the late 19th century followed by repeated extinction events of escaped HC20_373.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Core and accessory genetic diversity within 496 HC20_373 genomes from strains stored in laboratory collections or isolated from nature.
A). Core SNP distances. Numbers of non-repetitive SNPs in an all against all comparison of pairs of genomes within each category. Maximum numbers of core SNP differences: Natural isolates, 30; ATCC 14028s, 16; resequenced genomes, 16. B). Numbers of additional accessory genes according to a pan-genome of all isolates in an all against all comparison of pairs of genomes within each category. C). Numbers of deleted accessory genes according to a pan-genome of all isolates in an all against all comparison of pairs of genomes within each category. ATCC 14028s (black): 172 strains from laboratory sources or from laboratory infections. HC20_373 natural isolates (blue): 324 strains from all other sources. Resequenced salmonellae (red; only in part A): 285 pairs of genomes that were sequenced twice for the UCCUoW 10K genomes project [14]. Additional details on the accessory genes in parts B and C can be found in S2 Fig.
Fig 2
Fig 2. Neighbor Joining tree of allelic differences in genomes in HC50_147.
Ninja NJ [64] visualization of allelic differences in the 3002 core genes of the cgMLST Salmonella scheme with GrapeTree [57]. At least 20 alleles differ between HC20_373 and all other HC20 clusters in the tree. The tree encompasses all 2098 HC50_147 genomes in EnteroBase on 1 December, 2020 at which time-point EnteroBase contained >270,000 Salmonella genomes. An interactive plot of this data can be found together with additional metadata at http://enterobase.warwick.ac.uk/ms_tree/50936.
Fig 3
Fig 3. Pie charts of the environmental sources of 341 natural Salmonella strains in HC20_373 according to geographic source.
The diameters of the pie charts are scaled to the numbers of isolates from each geographical source. Seventeen of the isolates were from laboratory infections and the others were from the sources indicated in the Key Legend (left). Map and pie charts were generated using the open-source D3.js [65] library GeographicLib [66] with the MIT/X11 license at https://github.com/d3/d3-geo/blob/main/LICENSE.
Fig 4
Fig 4. Maximum likelihood (RaXML-NG [67]) phylogenetic tree of 462 non-repetitive SNPs in 496 genomes of strains in HC20_373 plus one outgroup genome (strain SAP17-7699; NCBI Accession GCF_005885875; EnteroBase barcode SAL_AB1180AA) from the related HC20 cluster, HC20_147.
The tree is presented with GrapeTree [57]. Indistinguishable genomes were collapsed into pie-chart nodes, the areas of which are proportional to the number of genomes, and whose color-coded sectors indicate their General Source (Key Legend). The phylogenetic root of HC20_373 is indicated by the branch connecting node A1 with an outgroup genome from HC20_147 (EnteroBase strain barcode SAL_AB1180AA). Other branches were rotated manually without distorting their topologies in order to cluster tips by General Source. A visual examination of the tree indicates a temporal progression from the root (A1) to the current versions of individual laboratory strains (S2 Table): 1. ATCC14028s University of Arizona (U.S.A.; node F2; 1960), 2. CIP104115 Institut Pasteur (Paris; A2; 1994), 3. NCTC 12023 Colindale (London; A1; 1987) and 4. NCTC 12023 Holden (London; D2; 1995), 5. NCTC 12023 NalR Hensel (Erlangen; D3; 1996), 6. NCTC 12023 Gerlach (Wernigerode; D2; 2003). The percentages of each of three heterozygous variant SNPs in the completed genome of NCTC 12023 (S2 Table) are indicated on lines from nodes A1, C1 and D1 to 3. NCTC12023 Colindale. This temporal progression is also in accord with an arbitrary partitioning of the tree into six sectors, A-F, consisting of apparent radial expansions of variants from central nodes (A1, C1, D1, F1, etc.). Partition D contains ATCC14028s variants from various global sources. E includes laboratory-derived mutants of ATCC14028s [68] plus their descendant nodes. F includes an early isolate of ATCC14028s (node F2) sequenced by Jarvik et al. [1]. D, E and F are separated by internal partitions A and C. Partition A was the parent of partition B, which consists of strains from natural sources. Interactive plots of the data including additional metadata can be found at http://enterobase.warwick.ac.uk/a/54094.
Fig 5
Fig 5. Properties of HC20 clusters among >300,000 assembled genomes within the Salmonella database within EnteroBase (June, 2021).
A) Numbers of HC20 clusters vs numbers of genomes per cluster. B) Average pairwise allelic differences between genomes in HC20 clusters containing at least 400 genomes. Red indicates selected HC20 clusters whose properties are summarized in Tables 2 and S4. NJ trees of allelic distances can be found in S4–S9 Figs. HC20_373 is highlighted in green.
Fig 6
Fig 6. Bayesian BEAST [40] temporal dating of ATCC14028s and its natural derivatives.
The figure depicts a tip-dated tree of 419 genomes from HC20_373 whose metadata included a tip date and which were not outliers in preliminary analyses (Table 4), rooted with the same outgroup genome from HC20_147 described in Fig 4. The topology of partition clustering is consistent with that of an ML tree (Fig 4), and was emphasized visually by rearranging branches manually such that they grouped by partition without changing their length or phylogenetic relationships. Partitions A-E from Fig 4 are indicated at the right and marked by distinctive blocks of colors within the tree. Partition F is lacking because no dates of isolation were available for any genomes from that partition. Tips are color-coded by Genome Source, as indicated in the Key legend. The scale at the bottom indicates the most probable calculated dates for tips and internal branch-points, but is broken between 1890 and 1980 to save space.
Fig 7
Fig 7. Cartoon of the microevolution inferred for HC20_373 since the original isolation of ATCC 14028s in 1960.
Extinct branches from which genomes were not sequenced are indicated by dashed lines whereas extinct branches with genomic sequences are terminated by stars. Colors indicate sources of bacterial isolates.

References

    1. Jarvik T, Smillie C, Groisman EA, Ochman H (2010) Short-term signatures of evolutionary change in the Salmonella enterica serovar Typhimurium 14028 genome. J Bacteriol 192: 560–567. doi: 10.1128/JB.01233-09 - DOI - PMC - PubMed
    1. Campoy S, Perez de Rozas AM, Barbe J, Badiola I (2000) Virulence and mutation rates of Salmonella typhimurium strains with increased mutagenic strength in a mouse model. FEMS Microbiol Lett 187: 145–150. doi: 10.1111/j.1574-6968.2000.tb09151.x - DOI - PubMed
    1. Prieto AI, Ramos-Morales F, Casadesus J (2004) Bile-induced DNA damage in Salmonella enterica. Genetics 168: 1787–1794. doi: 10.1534/genetics.104.031062 - DOI - PMC - PubMed
    1. Kim SI, Yoon H (2019) Roles of YcfR in biofilm Formation in Salmonella Typhimurium ATCC 14028. Mol Plant Microbe Interact 32: 708–716. doi: 10.1094/MPMI-06-18-0166-R - DOI - PubMed
    1. Carus WS (2011) Rajneeshees. In: Encyclopedia of Bioterrorism Defence. John Wiley & Sons, Inc. pp. 1–3.

Publication types

LinkOut - more resources