Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 21:13:196.
doi: 10.1186/1471-2164-13-196.

A neutral theory of genome evolution and the frequency distribution of genes

Affiliations

A neutral theory of genome evolution and the frequency distribution of genes

Bart Haegeman et al. BMC Genomics. .

Abstract

Background: The gene composition of bacteria of the same species can differ significantly between isolates. Variability in gene composition can be summarized in terms of gene frequency distributions, in which individual genes are ranked according to the frequency of genomes in which they appear. Empirical gene frequency distributions possess a U-shape, such that there are many rare genes, some genes of intermediate occurrence, and many common genes. It would seem that U-shaped gene frequency distributions can be used to infer the essentiality and/or importance of a gene to a species. Here, we ask: can U-shaped gene frequency distributions, instead, arise generically via neutral processes of genome evolution?

Results: We introduce a neutral model of genome evolution which combines birth-death processes at the organismal level with gene uptake and loss at the genomic level. This model predicts that gene frequency distributions possess a characteristic U-shape even in the absence of selective forces driving genome and population structure. We compare the model predictions to empirical gene frequency distributions from 6 multiply sequenced species of bacterial pathogens. We fit the model with constant population size to data, matching U-shape distributions albeit without matching all quantitative features of the distribution. We find stronger model fits in the case where we consider exponentially growing populations. We also show that two alternative models which contain a "rigid" and "flexible" core component of genomes provide strong fits to gene frequency distributions.

Conclusions: The analysis of neutral models of genome evolution suggests that U-shaped gene frequency distributions provide less information than previously suggested regarding gene essentiality. We discuss the need for additional theory and genomic level information to disentangle the roles of evolutionary mechanisms operating within and amongst individuals in driving the dynamics of gene distributions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Graphical illustration of a neutral model of genome evolution. A population of N = 4 organisms is shown, with genomes consisting of M = 3 genes. Colors denote different gene identities. We chose small values for N and M for illustrative purposes; realistic values are, e.g., N ~ 108 and M ~ 2000. (A) In a birth-death event an organism dies and is replaced by offspring of another organism. The offspring genome is identical to the parent genome. (B) In a gene transfer event a gene of one of the organisms is replaced by a gene from the environment. We assume that this new gene has not been present in the population before
Figure 2
Figure 2
Gene frequency distributions for neutral model of genome evolution (model A). Genome size M = 2000 and sample size G = 20. Gene transfer parameter θ: in left panel, θ = 0.03; in middle panel, θ = 0.3; in right panel: θ = 3
Figure 3
Figure 3
Data comparison for neutral model of genome evolution (model A). Comparison of gene frequency distributions with predictions of the simplest model: the population size is assumed to be constant and all genes are governed by the same gene transfer process. The model has one parameter, the gene transfer parameter θ. Black circles: data; red line with squares: model
Figure 4
Figure 4
Data comparison for model with exponentially growing populations (model B). Comparison of gene frequency distributions with predictions of the model in which population size is assumed to grow exponentially. The model has two parameters, the gene transfer parameter θ0 and the population growth parameter β. Black circles: data; blue line with squares: model.
Figure 5
Figure 5
Data comparison for models with rigid and flexible core genomes (models C and D). Comparison of gene frequency distributions with predictions of two models which assume that a part of a genome is more susceptible to gene transfer. The genomes in model C have a rigid core, i.e., some genes cannot be removed from the genomes. The genomes in model D have a flexible core, i.e., theses core genes can be moved around between genomes, but to a lesser extent than the other genes. Model C has two parameters, whereas model D has three parameters. Black circles: data; yellow line with squares: model C; green line with squares: model D.
Figure 6
Figure 6
Predictions for observed core and pan genome size for model D. We used the parameters λ1, θ1 and θ2 obtained from fitting the gene frequency distribution (see Figure 5) to evaluate the predicted core and pan genome size (see Additional file 1: Appendix S6). Black circles: data; green line: mean prediction; green shaded region: standard deviation of prediction. The increasing curves are for the pan genome; the decreasing curves are for the core genome.

References

    1. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, DeBoy RT, Davidsen TM, Mora M, Scarselli M, Margarit, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O'Connor KJB, Smith S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome. Proc Natl Acad Sci USA. 2005;102(39):13950–13955. doi: 10.1073/pnas.0506758102. - DOI - PMC - PubMed
    1. Hotopp JDC, Grifantini R, Kumar N, Tzeng YLL, Fouts D, Frigimelica E, Draghi M, Giuliani MMM, Rappuoli R, Stephens DS, Grandi G, Tettelin H. Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes. Microbiology. 2006;152(12):3733–3749. doi: 10.1099/mic.0.29261-0. - DOI - PubMed
    1. Hogg J, Hu F, Janto B, Boissy R, Hayes J, Keefe R, Post JC, Ehrlich G. Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. Genome Biol. 2007;8(6):R103+. - PMC - PubMed
    1. Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, Powell E, Keefe R, Ehrlich NE, Shen K, Hayes J, Barbadora K, Klimke W, Dernovoy D, Tatusova T, Parkhill J, Bentley SD, Post JC, Ehrlich GD, Hu FZ. Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol. 2007;189(22):8186–8195. doi: 10.1128/JB.00690-07. - DOI - PMC - PubMed
    1. Rasko DA, Rosovitz MJ, Myers GSA, Mongodin EF, Fricke WF, Gajer P, Crabtree J, Sebaihia M, Thomson NR, Chaudhuri R, Henderson IR, Sperandio V, Ravel J. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. 2008;190(20):6881–6893. doi: 10.1128/JB.00619-08. - DOI - PMC - PubMed

Publication types

LinkOut - more resources