Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Nov;60(4):708-20.
doi: 10.1007/s00248-010-9717-3. Epub 2010 Jul 11.

Comparison of 61 sequenced Escherichia coli genomes

Affiliations
Review

Comparison of 61 sequenced Escherichia coli genomes

Oksana Lukjancenko et al. Microb Ecol. 2010 Nov.

Abstract

Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics trees, and to identify the pan- and core genomes of this set of sequenced strains. A hierarchical clustering of variable genes allowed clear separation of the strains into clusters, including known pathotypes; clinically relevant serotypes can also be resolved in this way. In contrast, when in silico MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or 'accessory' genes thus make up more than 90% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group of Enterobacteriaceae.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogenetic tree based on extracted 16S rRNA sequences. a Comparison of 20 different Enterobacteriaceae, based on extracted 16S rRNA sequences from the GenBank sequence files. E. coli and Shigella are shown in green. b Tree of 61 sequenced E. coli (black) and related species (colored), based on the alignment of the 16S rRNA gene sequence. Apart from Shigella spp., the genes from E. albertii and E. fergusonii are also included (arrows). The 16S rRNA gene of S. enterica Typhimurium LT2 was used as the root. Bootstrap values, indicated in red, show that most nodes are predicted with uncertainty; nevertheless, the genera Escherichia spp. and Shigella spp. are not separated in this tree, and the three Escherichia species are also mixed
Figure 1
Figure 1
Phylogenetic tree based on extracted 16S rRNA sequences. a Comparison of 20 different Enterobacteriaceae, based on extracted 16S rRNA sequences from the GenBank sequence files. E. coli and Shigella are shown in green. b Tree of 61 sequenced E. coli (black) and related species (colored), based on the alignment of the 16S rRNA gene sequence. Apart from Shigella spp., the genes from E. albertii and E. fergusonii are also included (arrows). The 16S rRNA gene of S. enterica Typhimurium LT2 was used as the root. Bootstrap values, indicated in red, show that most nodes are predicted with uncertainty; nevertheless, the genera Escherichia spp. and Shigella spp. are not separated in this tree, and the three Escherichia species are also mixed
Figure 2
Figure 2
Phylogenetic tree of concatenated MLST gene alleles (adk, fumC, icd, gyrB, mdh, purA, recA), extracted from the genome sequences. Color use is the same as in Fig. 1
Figure 3
Figure 3
Pan-genome clustering of E. coli (black) and related species (colored), based on the alignment of their variable gene content. The genomes now cluster according to species and a relatedness between E. coli K12 derivatives (green block) and group B isolates (orange block) is visible
Figure 4
Figure 4
Pan- and core genome plot of the analyzed genomes. The blue pan-genome curve connects the cumulative number of gene families present in the analyzed genomes. The red core genome curve connects the conserved number of gene families. The gray bars show the numbers of novel gene families identified in each genome
Figure 5
Figure 5
BLAST atlas. In the middle, a genome atlas of E. coli O157:H7 strain EC4115 is shown, around which BLAST lanes are shown. Every lane corresponds to a genome, with the following colors (going outwards): green E. coli O157:H7 (15 lanes); light blue E. coli LANL strains (two lanes); dark blue Shigella spp. (eight lanes); red E. coli K12 and derivatives (six lanes); orange E. coli strain B phylogroup (four lanes); followed by all other E. coli genomes in different colors. The outermost three lanes represent E. fergusonii, E. albertii, and S. enterica Typhimurium LT2. Lack of color indicates that the genes at that position in strain EC4115 were not found in the genome of that lane. The position of replication origin and terminus is indicated

References

    1. Anjum MF, Lucchini S, Thompson A, Hinton JCD, Woodward MJ. Comparative genomic indexing reveals the phylogenomics of Escherichia coli pathogens. Infect Immun. 2003;71:4674–4683. doi: 10.1128/IAI.71.8.4674-4683.2003. - DOI - PMC - PubMed
    1. Blattner FR, Plunkett G3, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1462. doi: 10.1126/science.277.5331.1453. - DOI - PubMed
    1. Brenner DJ, Fanning GR, Skerman FJ, Falkow S. Polynucleotide sequence divergence among strains of Escherichia coli and closely related organisms. J. Bact. 1972;109:953–965. - PMC - PubMed
    1. Chain PS, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, Nelson KE, Parkhill J, Pitluck S, Qin X, Read TD, Schmutz J, Sozhamannan S, Sterk P, Strausberg RL, Sutton G, Thomson NR, Tiedje JM, Weinstock G, Wollam A, Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium. Detter JC. Genomics. Genome project standards in a new era of sequencing. Science. 2009;326:236–237. doi: 10.1126/science.1180614. - DOI - PMC - PubMed
    1. Chen SL, Hung C, Xu J, Reigstad CS, Magrini V, Sabo A, Blasiar D, Bieri T, Meyer RR, Ozersky P, Armstrong JR, Fulton RS, Latreille JP, Spieth J, Hooton TM, Mardis ER, Hultgren SJ, Gordon JI. Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc Natl Acad Sc USA. 2006;103:5977–5982. doi: 10.1073/pnas.0600938103. - DOI - PMC - PubMed

LinkOut - more resources