Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Case Reports
. 2008 Oct;190(20):6881-93.
doi: 10.1128/JB.00619-08. Epub 2008 Aug 1.

The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates

Affiliations
Case Reports

The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates

David A Rasko et al. J Bacteriol. 2008 Oct.

Abstract

Whole-genome sequencing has been skewed toward bacterial pathogens as a consequence of the prioritization of medical and veterinary diseases. However, it is becoming clear that in order to accurately measure genetic variation within and between pathogenic groups, multiple isolates, as well as commensal species, must be sequenced. This study examined the pangenomic content of Escherichia coli. Six distinct E. coli pathovars can be distinguished using molecular or phenotypic markers, but only two of the six pathovars have been subjected to any genome sequencing previously. Thus, this report provides a seminal description of the genomic contents and unique features of three unsequenced pathovars, enterotoxigenic E. coli, enteropathogenic E. coli, and enteroaggregative E. coli. We also determined the first genome sequence of a human commensal E. coli isolate, E. coli HS, which will undoubtedly provide a new baseline from which workers can examine the evolution of pathogenic E. coli. Comparison of 17 E. coli genomes, 8 of which are new, resulted in identification of approximately 2,200 genes conserved in all isolates. We were also able to identify genes that were isolate and pathovar specific. Fewer pathovar-specific genes were identified than anticipated, suggesting that each isolate may have independently developed virulence capabilities. Pangenome calculations indicate that E. coli genomic diversity represents an open pangenome model containing a reservoir of more than 13,000 genes, many of which may be uncharacterized but important virulence factors. This comparative study of the species E. coli, while descriptive, should provide the basis for future functional work on this important group of pathogens.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Gene content and synteny of the commensal isolate E. coli HS. (A) Gene conservation using E. coli HS as the reference strain. Starting from the outside, the first circle shows the genes in the forward orientation. The second circle shows the genes in the reverse orientation relative to the origin. The third circle shows the chi-square values, representing differences in the local G+C content. The fourth circle shows all of the genes that are unique to E. coli HS. Circles 5 to 20 show the gene conservation in all of the other E. coli genomes compared in the following order: MG1655, W3110, E24377A, B7A, EDL933, Sakai, CFT073, F11, UTI89, 536, E22, E110019, B171, Ec042, 101-1, and APEC01 (Table 1). The color indicates that a gene is present (red), divergent (green), or absent (blue). Three additional regions which are unique to E. coli HS are indicated: one phage region (∼0.3 Mb) that is not shared with any of the other sequenced strains, the serogroup O9-specific region (∼2.1 Mb), and one additional cluster that is shared only with Shigella species (∼2.6 Mb). (B) Gene synteny (conserved gene order) for E. coli HS and E. coli K-12. The color indicates the level of similarity between regions, as shown by the scale on the lower right. The arrows indicate three regions of diversity for these genomes. The upper and lower arrows indicate the unique phage and the O9 serogroup cluster. Overall, there is a great deal of synteny between the two genomes. A similar pattern was observed for most other complete genomes; the exception was the EHEC strain EDL933 genome, which contains a single large inversion.
FIG. 2.
FIG. 2.
Commensal features are often shared with one or more pathogens. Using E. coli HS as the reference, we identified regions, based on the annotation or similarity to known features, that could be associated with colonization of the human gastrointestinal tract. These regions were grouped into four general categories: pili or pilus-associated genes, fimbriae, general secretion, and type III secretion. Isolates are arranged in vertical lines, and each horizontal group is based on a single gene or peptide. The color indicates the level of similarity; red indicates the most similar (∼100% identical), green indicates little or no similarity, and black indicates ∼50% identity over the length of the sequence queried. It is clear that some pathogenic groups have features more or less similar to features of E. coli HS. Notably, the general secretion system genes for two separate systems are absent in both ETEC strains; however, they are present in three of four UPEC strains. The presence of one of the secretion systems is variable in EHEC, EAEC, and EPEC isolates, and the opposite phenotype is present in the laboratory-adapted strains, suggesting that this system may play a role in colonization.
FIG. 3.
FIG. 3.
Secreted effector molecules identified by Tobe et al. (62) identified in other E. coli genomes. The BLAST identity is shown as a heat map constructed using the functionally and bioinformatically identified secreted effector molecules from the EHEC isolate E. coli Sakai. Red indicates a higher level of similarity, and green indicates a lower level of similarity.
FIG. 4.
FIG. 4.
Conserved core, unique, and pangenome calculations for E. coli. (A) Each point indicates the number of genes that are conserved in genomes. The red line shows the exponential decay model based on the median value for conserved genes when increasing numbers of genomes were compared. (B) Decreasing number of unique genes in a genome with increasing number of genomes compared. The red line shows the exponential decay model based on the median value for unique genes when increasing numbers of genomes were compared. (C) Pangenome of the species E. coli. The extrapolated curve continues to increase, and thus E. coli has an open pangenome.

References

    1. Blattner, F. R., G. Plunkett III, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 2771453-1474. - PubMed
    1. Bobik, T. A., G. D. Havemann, R. J. Busch, D. S. Williams, and H. C. Aldrich. 1999. The propanediol utilization (pdu) operon of Salmonella enterica serovar Typhimurium LT2 includes genes necessary for formation of polyhedral organelles involved in coenzyme B12-dependent 1,2-propanediol degradation. J. Bacteriol. 1815967-5975. - PMC - PubMed
    1. Caron, E., V. F. Crepin, N. Simpson, S. Knutton, J. Garmendia, and G. Frankel. 2006. Subversion of actin dynamics by EPEC and EHEC. Curr. Opin. Microbiol. 940-45. - PubMed
    1. Carpenter, C. M., E. R. Hall, R. Randall, R. McKenzie, F. Cassels, N. Diaz, N. Thomas, P. Bedford, M. Darsley, C. Gewert, C. Howard, R. B. Sack, D. A. Sack, H. S. Chang, G. Gomes, and A. L. Bourgeois. 2006. Comparison of the antibody in lymphocyte supernatant (ALS) and ELISPOT assays for detection of mucosal immune responses to antigens of enterotoxigenic Escherichia coli in challenged and vaccinated volunteers. Vaccine 243709-3718. - PubMed
    1. Chen, H. D., and G. Frankel. 2005. Enteropathogenic Escherichia coli: unravelling pathogenesis. FEMS Microbiol. Rev. 2983-98. - PubMed

Publication types

LinkOut - more resources