Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 1;34(10):2627-2636.
doi: 10.1093/molbev/msx189.

Impact of Recombination on the Base Composition of Bacteria and Archaea

Affiliations

Impact of Recombination on the Base Composition of Bacteria and Archaea

Louis-Marie Bobay et al. Mol Biol Evol. .

Abstract

The mutational process in bacteria is biased toward A and T, and most species are GC-rich relative to the mutational input to their genome. It has been proposed that the shift in base composition is an adaptive process-that natural selection operates to increase GC-contents-and there is experimental evidence that bacterial strains with GC-rich versions of genes have higher growth rates than those strains with AT-rich versions expressing identical proteins. Alternatively, a nonadaptive process, GC-biased gene conversion (gBGC), could also increase the GC-content of DNA due to the mechanistic bias of gene conversion events during recombination. To determine what role recombination plays in the base composition of bacterial genomes, we compared the spectrum of nucleotide polymorphisms introduced by recombination in all microbial species represented by large numbers of sequenced strains. We found that recombinant alleles are consistently biased toward A and T, and that the magnitude of AT-bias introduced by recombination is similar to that of mutations. These results indicate that recombination alone, without the intervention of selection, is unlikely to counteract the AT-enrichment of bacterial genomes.

Keywords: G+C contents; bacterial genomes; biased gene conversion; recombination; sequence evolution.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Identification of recombinant alleles. (A) Homoplasies were identified as those single nucleotide polymorphisms (SNPs) whose distributions are incongruent with the strain phylogeny as determined by both distance-based and topology-based methods. To distinguish homoplasies attributable to recombination from those generated by convergent mutations, analyses were confined to cases where two or more homoplasies within the same gene contained identical SNPs at the identical locations exhibiting the identical distribution among strains, a circumstance unlikely to occur by independent mutations. (B) The ancestral state of each homoplasy was determined from the number of nodes separating the recombinant SNP in two or more strains. We considered a polymorphic allele to represent the acquired state (i.e., as having been introduced by recombination) if present in strains separated by at least four nodes in the unrooted tree of each species.
<sc>Fig</sc>. 2.
Fig. 2.
Comparison of nucleotide changes introduced by recombination and mutation. Cumulative proportions of SNPs at 4-fold degenerate sites (GC4) for each of the six types of nucleotide changes, as calculated for all alleles introduced by recombination new alleles introduced by recombination all alleles introduced by mutations new alleles introduced by mutations. Values were normalized by the GC-contents at 4-fold degenerate sites for each species prior to calculating overall proportions, and species with fewer than 50 polymorphic sites for a given category of alleles were excluded.
<sc>Fig</sc>. 3.
Fig. 3.
Equilibrium GC content inferred from new polymorphisms relative to actual GC content. GC4eq is the expected GC-content at 4-fold degenerate sites for a given species when based on new alleles introduced by recombination (left panel) and new alleles introduced by mutations (right panel). GC4eq values were normalized by the GC-contents at 4-fold degenerate sites for each species, and species with fewer than 50 polymorphic sites for any given category of allele were excluded. Points in shaded area below the diagonal denote species that are GC-rich relative to the input of polymorphisms by recombination or mutation.
<sc>Fig</sc>. 4.
Fig. 4.
Impact of recombination and mutations on genomic nucleotide composition and codon usage. (A) The metric B represents the number of changes from G or C to A or T relative to the number of changes from A or T to G or C at 4-fold degenerate sites. B > 1 indicates an enrichment toward A and T, and B < 1 indicates an enrichment toward G and C. (B) The metric δ denotes the shift in codon usage caused by synonymous changes in the coding sequences of genes constituting the core genome of each species. δ > 0 indicates that nucleotide changes led toward more commonly used codons, and δ < 0 indicates that nucleotide changes led toward less frequently used codons. Values shown are for all alleles introduced by recombination new alleles introduced by recombination; all alleles introduced by mutations; new alleles introduced by mutations. Values were normalized by the GC-contents at 4-fold degenerate sites for each species prior to calculating overall proportions, and species with fewer than 50 polymorphic sites for a given category of alleles were excluded.
<sc>Fig</sc>. 5.
Fig. 5.
Testing for selection on nucleotide changes introduced by recombination and mutation. dN/dS ratios were calculated from the concatenate of the core genome of each species. Values shown are for all alleles introduced by recombination; new alleles introduced by recombination; all alleles introduced by mutations; new alleles introduced by mutations.
<sc>Fig</sc>. 6.
Fig. 6.
Nucleotide bias of recent recombinant alleles and recent mutations. The metric B represents the number of changes from G or C to A or T relative to the number of changes from A or T to G or C at 4-fold degenerate sites. On each axis, a value of B < 1 indicates an enrichment toward G and C (lower left corner), and value of B > 1 indicates an enrichment toward A and T. The solid diagonal line indicates identical nucleotide bias for recent recombinant alleles and recent mutations; the dotted lines represent half of the standard deviation. Values were normalized by the GC-contents at 4-fold degenerate sites for each species prior to calculating overall proportions, and species with fewer than 50 polymorphic sites for a given category of alleles were excluded. As indicated in the key, the color shading of dots denotes the dN/dS ratio of recombinant alleles for a species. All phylogenetic trees of species included in this analysis display average bootstrap values >70, and the list of included species and the corresponding average bootstrap values are indicated in bold in supplementary table S1, Supplementary Material online.

Similar articles

Cited by

References

    1. Bobay LM, Traverse CC, Ochman H.. 2015. Impermanence of bacterial clones. Proc Natl Acad Sci U S A. 112:8893–8900. - PMC - PubMed
    1. Bruen TC, Philippe H, Bryant D.. 2006. A simple and robust statistical test for detecting the presence of recombination. Genetics 172:2665–2681. - PMC - PubMed
    1. Dettman JR, Rodrigue N, Kassen R.. 2014. Genome-wide patterns of recombination in the opportunistic human pathogen Pseudomonas aeruginosa. Genome Biol Evol. 7:18–34. - PMC - PubMed
    1. Didelot X, Maiden MC.. 2010. Impact of recombination on bacterial evolution. Trends Microbiol. 18:315–322. - PMC - PubMed
    1. Drummond DA, Wilke CO.. 2008. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352. - PMC - PubMed

Publication types