Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;4(4):443-56.
doi: 10.1093/gbe/evs016. Epub 2012 Feb 21.

The infinitely many genes model for the distributed genome of bacteria

Affiliations

The infinitely many genes model for the distributed genome of bacteria

Franz Baumdicker et al. Genome Biol Evol. 2012.

Abstract

The distributed genome hypothesis states that the gene pool of a bacterial taxon is much more complex than that found in a single individual genome. However, the possible fitness advantage, why such genomic diversity is maintained, whether this variation is largely adaptive or neutral, and why these distinct individuals can coexist, remains poorly understood. Here, we present the infinitely many genes (IMG) model, which is a quantitative, evolutionary model for the distributed genome. It is based on a genealogy of individual genomes and the possibility of gene gain (from an unbounded reservoir of novel genes, e.g., by horizontal gene transfer from distant taxa) and gene loss, for example, by pseudogenization and deletion of genes, during reproduction. By implementing these mechanisms, the IMG model differs from existing concepts for the distributed genome, which cannot differentiate between neutral evolution and adaptation as drivers of the observed genomic diversity. Using the IMG model, we tested whether the distributed genome of 22 full genomes of picocyanobacteria (Prochlorococcus and Synechococcus) shows signs of adaptation or neutrality. We calculated the effective population size of Prochlorococcus at 1.01 × 10(11) and predicted 18 distinct clades for this population, only six of which have been isolated and cultured thus far. We predicted that the Prochlorococcus pangenome contains 57,792 genes and found that the evolution of the distributed genome of Prochlorococcus was possibly neutral, whereas that of Synechococcus and the combined sample shows a clear deviation from neutrality.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
Two realizations of the IMG model. The underlying genealogy is given by the coalescent, and gene gain (triangle up) and loss events (triangle down) are superimposed on the coalescent. Gene gain and loss events of the same genes are marked in the same color.
F<sc>IG</sc>. 2.—
FIG. 2.—
The false-positive rates of the neutrality test are shown for different values of θ. The gene loss rate was set to ρ = 0.5, 1, 2, 10. For each parameter combination, we simulated 1,000 independent data sets, each of size n = 7.
F<sc>IG</sc>. 3.—
FIG. 3.—
The phylogeny of Prochlorococcus based on 913 core genes. Sequences were aligned using muscle, and the tree was inferred by the software ClonalFrame using the parameters -x 17500 -y 2500 -z 50 -G -H. Numbers indicate the probability that the respective branch appears in a random draw from the posterior distribution as given by ClonalFrame.
F<sc>IG</sc>. 4.—
FIG. 4.—
The gene content tree for Prochlorococcus. The bootstrap values have been computed using random samples of the generated gene clusters.
F<sc>IG</sc>. 5.—
FIG. 5.—
The gene frequency spectrum for our data set of 11 individuals of Prochlorococcus and Synechococcus, respectively. The x axis gives the number of individuals a gene can be present in, and the y axis gives how many genes are present in that frequency. Predictions are obtained using estimates from table 1 either on a fixed tree or on the average over a random tree.
F<sc>IG</sc>. 6.—
FIG. 6.—
Data (i.e., a sample of 11 complete genomes) are generated according to the supragenome model and the IMG model, respectively. This means that the data consist of information about presence and absence of genes. Then, the gene content tree, inferred from pairwise distances of individuals, is drawn.

Similar articles

Cited by

References

    1. Akopyants N, et al. PCR-based subtractive hybridization and differences in gene content among strains of Helicobacter pylori. Proc Natl Acad Sci U S A. 1998;95:13108–13113. - PMC - PubMed
    1. Aldous D, Popovic L. A critical branching process model for biodiversity. Adv Appl Probab. 2005;37:1094–1115.
    1. Avrani S, Wurtzel O, Sharon I, Sorek R, Lindell D. Genomic island variability facilitates Prochlorococcus-virus coexistence. Nature. 2011;474:604–608. - PubMed
    1. Baumdicker F, Hess WR, Pfaffelhuber P. The diversity of a distributed genome in bacterial populations. Ann Appl Probab. 2010;20:1567–1606.
    1. Baumdicker F, Pfaffelhuber P. Evolution of bacterial genomes under horizontal gene transfer [Internet] 2011. Dublin (Ireland): ISI Congress. Available from: http://arxiv.org/abs/1105.5014, 1–8.

Publication types

Substances