Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 22;9(1):17394.
doi: 10.1038/s41598-019-54004-5.

An Escherichia coli ST131 pangenome atlas reveals population structure and evolution across 4,071 isolates

Affiliations

An Escherichia coli ST131 pangenome atlas reveals population structure and evolution across 4,071 isolates

Arun Gonzales Decano et al. Sci Rep. .

Abstract

Escherichia coli ST131 is a major cause of infection with extensive antimicrobial resistance (AMR) facilitated by widespread beta-lactam antibiotic use. This drug pressure has driven extended-spectrum beta-lactamase (ESBL) gene acquisition and evolution in pathogens, so a clearer resolution of ST131's origin, adaptation and spread is essential. E. coli ST131's ESBL genes are typically embedded in mobile genetic elements (MGEs) that aid transfer to new plasmid or chromosomal locations, which are mobilised further by plasmid conjugation and recombination, resulting in a flexible ESBL, MGE and plasmid composition with a conserved core genome. We used population genomics to trace the evolution of AMR in ST131 more precisely by extracting all available high-quality Illumina HiSeq read libraries to investigate 4,071 globally-sourced genomes, the largest ST131 collection examined so far. We applied rigorous quality-control, genome de novo assembly and ESBL gene screening to resolve ST131's population structure across three genetically distinct Clades (A, B, C) and abundant subclades from the dominant Clade C. We reconstructed their evolutionary relationships across the core and accessory genomes using published reference genomes, long read assemblies and k-mer-based methods to contextualise pangenome diversity. The three main C subclades have co-circulated globally at relatively stable frequencies over time, suggesting attaining an equilibrium after their origin and initial rapid spread. This contrasted with their ESBL genes, which had stronger patterns across time, geography and subclade, and were located at distinct locations across the chromosomes and plasmids between isolates. Within the three C subclades, the core and accessory genome diversity levels were not correlated due to plasmid and MGE activity, unlike patterns between the three main clades, A, B and C. This population genomic study highlights the dynamic nature of the accessory genomes in ST131, suggesting that surveillance should anticipate genetically variable outbreaks with broader antibiotic resistance levels. Our findings emphasise the potential of evolutionary pangenomics to improve our understanding of AMR gene transfer, adaptation and transmission to discover accessory genome changes linked to novel subtypes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Methods summary: 4,870 read libraries were downloaded from Enterobase. 718 uninformative ones were excluded. Of those assessed, four were long read libraries (PacBio) and the rest were short paired-end reads (Illumina HiSeq). The adapters of the four PacBio and 4,147 Illumina reads were trimmed using Cutadapt and Fastp, respectively. The resulting adapter-free reads were assembled using Unicycler. An iterative genome quality check eliminated three PacBio and 77 Illumina libraries, yielding 4,071 as the final collection. Cleaned reads after Quast filtering were examined with Roary using Prokka annotation to evaluate the pangenomic diversity. Phylogenetic construction was performed by RAxML on the core genome. The assembled genomes were annotated and screened for AMR genes (including blaCTX-M-14/15/27) and their context. Genetically distinct clusters from the phylogeny were determined using Fastbaps. Distances between the core and accessory genomes of isolate pairs were estimated using Poppunk based on k-mer differences.
Figure 2
Figure 2
(a) A maximum likelihood phylogeny of 4,071 ST131 genome collection and (b) the distribution of these isolates across countries and over time. The phylogeny shows Clades A (n = 414 genomes, pale green), B (n = 420 genomes, dark green), B0 (n = 13 genomes, orange), C0 (n = 52 genomes, blue), C1 (n = 1,121 genomes, bright green) and C2 (n = 2,051 genomes, purple). The phylogeny constructed with RAxML from the 30,029 chromosome-wide SNPs arising by mutation was visualized with iTol. The inner colored strip surrounding the tree represents the subgroups formed from Fastbaps clustering and the cluster (1–11) associated with each isolate. The outer colored strip surrounding the tree is the fimH allele: H41 for Clade A (pink), H22 for Clade B (blue), H30 for Clade C (black) and other alleles (white). The histograms in (b) show the distribution of sampling across countries, and that out of the 4,071 genomes isolated from 1999 to 2018, 2,051 belong to C2 (pink), with most isolates coming from 2002–2017.
Figure 3
Figure 3
(a) A maximum likelihood phylogeny of 3,224 ST131 Clade C isolates and (b) the distribution of these isolates across countries and over time. The phylogeny shows C0 (n = 52 genomes, blue), C1 (n = 1,121 genomes, bright green) and C2 (n = 2,051 genomes, purple). As per Fig. 2, the phylogeny constructed with RAxML from the 30,029 chromosome-wide SNPs arising by mutation was visualized with iTol. The inner colored strip surrounding the tree represents the subgroups formed from Fastbaps clustering and the cluster (1–11) associated with each isolate. This indicated most C0 were in Fastbaps cluster 11 (n = 52 genomes) with a single isolate in cluster 10 (grey). Of the 1,121 C1 isolates, 1,113 formed Fastbaps cluster 6 (green) and eight were assigned cluster 10 (black). The C2 subclades corresponded to Fastbaps clusters 9 (C2_9, n = 1,651 isolates, pink) and 4 (C2_4, n = 386, dark purple). The middle colored strip indicates the isolates with an interrupted mppA gene (black) relative to the wild-type intact version (grey). The outer colored strip is the blaCTX-M allele: 2,416 genomes had blaCTX-M-14, blaCTX-M-15, or blaCTX-M-27 genes, 1,790 genomes had blaCTX-M-15 (mainly C2), 177 genomes had blaCTX-M-14 (mainly C1) and 424 genomes had blaCTX-M-27 (mainly C1). The M27PP1 locus gain denoting the C1-M27 lineage (red box) was found in 468 C1_6 genomes (though independent events occurred too). The histograms in (b) show the distribution of sampling across countries, and that since 2002 both C1_6 (green) and C2_9 (pink) were common until the emergence of C2_4 (blue) and to a lesser extent C0 (brown).
Figure 4
Figure 4
Frequencies of blaCTX-M alleles in C subclades C1_6 (red), C2_4 (green) and C2_9 (blue).
Figure 5
Figure 5
The distribution of core (π, x-axis) and accessory pairwise genome distances (a, y-axis) with blue dots indicating isolate pairs and the contours indicating dot density (higher in yellow). Top left: All 4,071 assemblies displayed pairwise differences such that the contours indicated the three main clades: Clade A at π = 0.0038, a = 0.15; Clade B at π = 0.0014, a = 0.18; Clade C at both π = 0.0005, a = 0.08 and π = 0.0001, a = 0.09. Top right: 1,113 subclade C1_6 assemblies had a peaks mainly at π ≤ 0.001, a = 0.06. Bottom left: 386 subclade C2_4 assemblies had peaks at π = 0.0006, a = 0.045 and π = 0.0001, a = 0.040. Bottom right: 1,651 subclade C2_9 assemblies had peaks at π = 0.0007, a = 0.065 and π = 0.0, a = 0.090. Results for 2,416 blaCTX-M-positive Clade C assemblies and 52 subclade C0 assemblies were similar. Within subclades C1_6, C2_4, C2_9, isolates had more diverse accessory genomes compared to their core ones.
Figure 6
Figure 6
Top: The average number of genes in the ST131 pangenome (y-axis) increased as the 4,071 genomes were added (x-axis) indicating an open pangenome for the whole collection (black), as well as its clades and subclades: Clade A (blue), Clade B (green), Clade C (grey), subclade C1_6 (red), subclade C2_4 (orange) and subclade C2_9 (brown). Below: Alpha varied with numbers of genomes sampled (shown here for >30 genomes) and was more independent from sample number once the number of genomes examined about >250. Note that the x-axis’ log10 scale.

References

    1. de Kraker MEA, et al. The changing epidemiology of bacteraemias in Europe: trends from the European Antimicrobial Resistance Surveillance System. Clin Microbiol Infect. 2013;19:860–868. doi: 10.1111/1469-0691.12028. - DOI - PubMed
    1. Poolman JT, Wacker M. Extraintestinal Pathogenic Escherichia coli, a Common Human Pathogen: Challenges for Vaccine Development and Progress in the Field. J Infect Dis. 2016;213(1):6–13. doi: 10.1093/infdis/jiv429. - DOI - PMC - PubMed
    1. Banerjee R, Johnson JR. A new clone sweeps clean: the enigmatic emergence of Escherichia coli sequence type 131. Antimicrob Agents Chemother. 2014;58:4997–5004. doi: 10.1128/AAC.02824-14. - DOI - PMC - PubMed
    1. ECDC, European Centre for Disease Prevention and Control. European Centre for Disease Prevention and Control. Antimicrobial resistance surveillance in Europe 2015. Annual Report of the European Antimicrobial Resistance Surveillance Network (EARS-Net). Stockholm: ECDC (2017).
    1. Findlay, J. et al. Characterisation of cefotaxime-resistant urinary Escherichia coli from primary care in South-West England 2017–2018. bioRxiv, 10.1101/701383 (2019). - PubMed

Publication types

MeSH terms

LinkOut - more resources