Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 24;11(1):5968.
doi: 10.1038/s41467-020-19714-9.

Horizontally acquired papGII-containing pathogenicity islands underlie the emergence of invasive uropathogenic Escherichia coli lineages

Affiliations

Horizontally acquired papGII-containing pathogenicity islands underlie the emergence of invasive uropathogenic Escherichia coli lineages

Michael Biggel et al. Nat Commun. .

Abstract

Escherichia coli is the leading cause of urinary tract infection, one of the most common bacterial infections in humans. Despite this, a genomic perspective is lacking regarding the phylogenetic distribution of isolates associated with different clinical syndromes. Here, we present a large-scale phylogenomic analysis of a spatiotemporally and clinically diverse set of 907 E. coli isolates, including 722 uropathogenic E. coli (UPEC) isolates. A genome-wide association approach identifies the (P-fimbriae-encoding) papGII locus as the key feature distinguishing invasive UPEC, defined as isolates associated with severe UTI, i.e., kidney infection (pyelonephritis) or urinary-source bacteremia, from non-invasive UPEC, defined as isolates associated with asymptomatic bacteriuria or bladder infection (cystitis). Within the E. coli population, distinct invasive UPEC lineages emerged through repeated horizontal acquisition of diverse papGII-containing pathogenicity islands. Our findings elucidate the molecular determinants of severe UTI and have implications for the early detection of this pathogen.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Phylogeny of 907 Escherichia coli isolates associated with different clinical phenotypes.
Midpoint-rooted maximum-likelihood phylogenetic tree based on 109,023 variable sites identified in a core genome alignment (1.136 Mbp). Ring 1, 2, and 3 denote predominant sequence types (ST), corresponding clonal complexes (CC), and phylogroup assignment. Clinical phenotypes are labeled according to the key (ring 4). Phylogenetic clusters identified using BAPS (Bayesian analysis of population structure) significantly enriched with invasive UPEC isolates are highlighted in ring 5. The presence of the afaVIII and papGII (blue and red dots) across the phylogeny is annotated in ring 6. The scale bar indicates the number of substitutions per site in the core genome alignment. A tree with bootstrap support values is provided in Supplementary Fig. 20. The tree was visualized using iTOL. An interactive visualization of this phylogenetic tree can be found out at https://microreact.org/project/O4QAYAJWw.
Fig. 2
Fig. 2. Manhattan plot for pan-genome-wide associations for invasive vs. non-invasive UPEC.
Data are based on 30,705 clusters of orthologous genes (COGs) identified in 722 UPEC isolates. Fecal isolates were not considered in this analysis because they present no urinary phenotype. The plot shows genes assigned to 4764 unique COGs identified in the genome of reference strain UMN026 (CC69). Each dot represents one COG. The vertical axis denotes raw P values of Fisher’s exact statistics. To account for the effects of sample size and population structure, the genome-wide significance threshold (dotted line, P = 1.42 × 10−18) was inferred from a simulated dataset using treeWAS. The horizontal axis gives the nucleotide position in the chromosome. COGs part of pathogenicity islands (PAIs), prophage regions, or the two plasmids of UMN026 are color-labeled, including the High Pathogenicity Island (HPI) and PAIUMN026-pheV, a papGII-containing PAI. The remaining 25,941 COGs of the pan-genome that did not map to UMN026 were not pan-genome-wide significant (Supplementary Fig. 21).
Fig. 3
Fig. 3. Genetic characterization of papGII+ E. coli lineages and papGII+PAIs.
a Maximum-likelihood phylogenetic tree of 333 papGII+ isolates based on 192,889 variable sites identified in a core genome alignment (2.573 Mbp). Phylogenetic lineages, defined by patristic distances, are collapsed on single nodes (indicated with triangles). Fourteen papGII+ lineages with >5 isolates (red triangles) were identified and named after their clonal complex (+L). Isolates in CC73 were investigated on an additional level of hierarchy and assigned to four papGII+ lineages (CC73-L1 to -L4) to account for the subclonal population structure with distinct characteristics. Each papGII+ lineage is labeled with the proportion of papGII+ pathogenicity island (PAI) types and insertion sites when their identification was possible. The presence of papGII+ PAI types was identified in complete or near-complete assemblies or predicted using a read-mapping-based approach. Fragmented assemblies, lack of resolved reference PAIs, or sequence deletions/insertions sometimes prevented the determination of the specific papGII+ PAI family type and insertion site (shown in gray). The proportion of isolates carrying PAI- or plasmid-associated iuc loci, the frequency of clinical phenotypes, and the total number of isolates are shown. The branch length of the outgroup (papGII-negative isolate 495_PUTI_Fec, clade I) was reduced (dashed line). The scale bar indicates the number of substitutions per site in the core genome alignment. A tree with expanded nodes and bootstrap values is shown in Supplementary Fig. 16. b Genetic organization of representative PAIs of the six identified papGII+ PAI types. The papGII operon, integrase gene, and virulence-associated genes are highlighted. The gradient scale shows the level of nucleotide identity. PAI sequences were compared and visualized using EasyFig. The genetic organization of all 42 resolved papGII+ PAIs is shown in Supplementary Fig. 22.
Fig. 4
Fig. 4. Number of virulence-associated genes (VAGs) and distribution of iron uptake systems.
a Boxplots showing the number of VAGs per isolate by clinical phenotype (fecal isolates, non-invasive UPEC isolates (asymptomatic bacteriuria (ABU), cystitis), and invasive UPEC isolates). papGII+ isolates are indicated as red dots. Asterisks indicate significant differences (***P < 0.001, two-sided Mann–Whitney U test, Bonferroni-corrected). Exact P values are reported in Supplementary Fig. 13. Boxplot center lines: median; box limits: upper and lower quartiles; whiskers extend from the hinges to the highest and lowest values that are within 1.5×IQR of the hinges. Source data are provided in Supplementary Data 8. b Number of iron uptake systems (ring 2) per isolate and presence of iuc (ring 3) visualized on the phylogenetic tree. Twenty-two different systems involved in iron uptake were identified in our dataset, with 8–19 systems found per isolate. Phylogroups (ring 1) and isolates part of papGII+ lineages (ring 4) are labeled. The tree was visualized using iTOL.
Fig. 5
Fig. 5. Phylogenetic trees of isolates belonging to pandemic UPEC lineages CC69, CC95, and CC73.
Midpoint rooted maximum-likelihood phylogenies based on core genome alignments (CC69: 76 isolates, 25,753 variable sites in 4.006 Mbp core genome; CC95: 107 isolates, 17,674 variable sites in 4.054 Mbp core genome; CC73: 164 isolates, 25,939 variable sites in 3.857 Mbp core genome). Clinical phenotypes are labeled at the branch tips (ring 3). The presence of papGII, papGIII, and iuc is shown (ring 2). When identification was possible, papGII+ PAI types are labeled (ring 1). Fragmented assemblies, lack of resolved reference PAIs, and sequence deletions or insertions within PAIs sometimes prevented the determination of the specific papGII+ PAI type. Isolates part of papGII+ lineages are shaded in red; isolates part of the same lineages but lacking papGII in bright red. Isolates with complete or near-complete genomes used to investigate the genetic context of papGII are annotated. Red branch lines indicate nodes with bootstrap values <70. Branch lengths of distantly related isolates (outgroup) are reduced and indicated as dashed lines. The papGII-negative subclade in CC95 corresponds to the previously defined subgroup B (serotype O18:H7). The trees were visualized using iTOL.

Comment in

  • Uro-Science.
    Atala A. Atala A. J Urol. 2021 Jul;206(1):162-163. doi: 10.1097/JU.0000000000001795. Epub 2021 Apr 16. J Urol. 2021. PMID: 33858176 No abstract available.

Similar articles

Cited by

References

    1. Vihta K-D, et al. Trends over time in Escherichia coli bloodstream infections, urinary tract infections, and antibiotic susceptibilities in Oxfordshire, UK, 1998–2016: a study of electronic health records. Lancet Infect. Dis. 2018;18:1138–1149. doi: 10.1016/S1473-3099(18)30353-0. - DOI - PMC - PubMed
    1. Abernethy J, et al. Epidemiology of Escherichia coli bacteraemia in England: results of an enhanced sentinel surveillance programme. J. Hosp. Infect. 2017;95:365–375. doi: 10.1016/j.jhin.2016.12.008. - DOI - PubMed
    1. van Hout D, et al. Extended-spectrum beta-lactamase (ESBL)-producing and non-ESBL-producing Escherichia coli isolates causing bacteremia in the Netherlands (2014–2016) differ in clonal distribution, antimicrobial resistance gene and virulence gene content. PLoS ONE. 2020;15:e0227604. doi: 10.1371/journal.pone.0227604. - DOI - PMC - PubMed
    1. Johnson JR, Russo TA. Extraintestinal pathogenic Escherichia coli: ‘The other bad E coli’. J. Lab. Clin. Med. 2002;139:155–162. doi: 10.1067/mlc.2002.121550. - DOI - PubMed
    1. Klein RD, Hultgren SJ. Urinary tract infections: microbial pathogenesis, host–pathogen interactions and new treatment strategies. Nat. Rev. Microbiol. 2020 doi: 10.1038/s41579-020-0324-0. - DOI - PMC - PubMed

Publication types

MeSH terms