Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun;195(12):2786-92.
doi: 10.1128/JB.02285-12. Epub 2013 Apr 12.

Evolution of pan-genomes of Escherichia coli, Shigella spp., and Salmonella enterica

Affiliations

Evolution of pan-genomes of Escherichia coli, Shigella spp., and Salmonella enterica

Evgeny N Gordienko et al. J Bacteriol. 2013 Jun.

Abstract

Multiple sequencing of genomes belonging to a bacterial species allows one to analyze and compare statistics and dynamics of the gene complements of species, their pan-genomes. Here, we analyzed multiple genomes of Escherichia coli, Shigella spp., and Salmonella enterica. We demonstrate that the distribution of the number of genomes harboring a gene is well approximated by a sum of two power functions, describing frequent genes (present in many strains) and rare genes (present in few strains). The virtual absence of Shigella-specific genes not present in E. coli genomes confirms previous observations that Shigella is not an independent genus. While the pan-genome size is increasing with each new strain, the number of genes present in a fixed fraction of strains stabilizes quickly. For instance, slightly fewer than 4,000 genes are present in at least half of any group of E. coli genomes. Comparison of S. enterica and E. coli pan-genomes revealed the existence of a common periphery, that is, genes present in some but not all strains of both species. Analysis of phylogenetic trees demonstrates that rare genes from the periphery likely evolve under horizontal transfer, whereas frequent periphery genes may have been inherited from the periphery genome of the common ancestor.

PubMed Disclaimer

Figures

Fig 1
Fig 1
Distribution of OGs by the number of strains in which they are present. (A) 32 E. coli sensu lato strains; (B) 32 E. coli sensu lato strains, approximated by the sum of three exponents. Dashed exponential lines (for unique genes, the periphery, and the universal genome, respectively) provide the decomposition of the general trend line for the distribution [solid line; y(x) = e−0.53x+8.3 + e−0.02x+4.41 + e0.6x−12.27]. The total squared error is 1.48. (C) 48 strains, with notation as in panel B [y(x) = e−0.86x+9.28 + e−0.07x+5.88 + e0.55x−19.81]. Peaks marked with arrows contain OGs comprising genes specific for S. enterica and E. coli sensu lato. (D) 25 E. coli and 7 Shigella strains, showing decomposition to the sum of two power law functions [solid line; y(x) = 4,400x−1.7 + 1,746 × (33 − x)−1.62]. The total squared error is 0.59.
Fig 2
Fig 2
(A to C) Sizes of the core genome (A), pan-genome (B), and new genes observed upon adding a new genome (C) for E. coli sensu lato strains. Gray lines, initial OGs; black lines, modified OGs; error bars, standard deviations.
Fig 3
Fig 3
Number of genes present in a given fraction of E. coli sensu lato genomes as dependent on the number of considered genomes. The topmost boundary shows the pan-genome size, the lowest boundary shows the core genome size, and the remaining boundaries show the percentile pan-genome sizes.
Fig 4
Fig 4
(A and B) Numbers of OGs present in a given number of strains in two groups: S. enterica versus E. coli sensu lato (A) or Shigella clones versus other E. coli strains (B). Horizontal axes show the numbers of strains in the specified group. Vertical axes show the numbers of OGs present in the given number of strains. The maximal value at the vertical axis is restricted to 950 to show small peaks in more detail, and the height of the universal-genome peak is indicated.

Similar articles

Cited by

References

    1. Welch RA, Burland V, Plunkett G, III, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HL, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 99:17020–17024 - PMC - PubMed
    1. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen TM, Mora M, Scarselli M, Margarit y Ros I, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O'Connor KJ, Smith S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM. 2005. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc. Natl. Acad. Sci. U. S. A. 102:13950–13955 - PMC - PubMed
    1. Snipen L, Ussery DW. 2010. Standard operating procedure for computing pangenome trees. Stand. Genomic Sci. 2:135–141 - PMC - PubMed
    1. Lapierre P, Gogarten JP. 2009. Estimating the size of the bacterial pan-genome. Trends Genet. 25:107–110 - PubMed
    1. Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, Crabtree J, Sebaihia M, Thomson NR, Chaudhuri R, Henderson IR, Sperandio V, Ravel J. 2008. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriol. 190:6881–6893 - PMC - PubMed

Publication types

LinkOut - more resources