Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 27;6(3):lqae088.
doi: 10.1093/nargab/lqae088. eCollection 2024 Sep.

Depletion of CpG dinucleotides in bacterial genomes may represent an adaptation to high temperatures

Affiliations

Depletion of CpG dinucleotides in bacterial genomes may represent an adaptation to high temperatures

Diego Forni et al. NAR Genom Bioinform. .

Abstract

Dinucleotide biases have been widely investigated in the genomes of eukaryotes and viruses, but not in bacteria. We assembled a dataset of bacterial genomes (>15 000), which are representative of the genetic diversity in the kingdom Eubacteria, and we analyzed dinucleotide biases in relation to different traits. We found that TpA dinucleotides are the most depleted and that CpG dinucleotides show the widest dispersion. The abundances of both dinucleotides vary with genomic G + C content and show a very strong phylogenetic signal. After accounting for G + C content and phylogenetic inertia, we analyzed different bacterial lifestyle traits. We found that temperature preferences associate with the abundance of CpG dinucleotides, with thermophiles/hyperthemophiles being particularly depleted. Conversely, the TpA dinucleotide displays a bias that only depends on genomic G + C composition. Using predictions of intrinsic cyclizability we also show that CpG depletion may associate with higher DNA bendability in both thermophiles/hyperthermophiles and mesophiles, and that the former are predicted to have significantly more flexible genomes than the latter. We suggest that higher bendability is advantageous at high temperatures because it facilitates DNA positive supercoiling and that, through modulation of DNA mechanical properties, local or global CpG depletion controls genome organization, most likely not only in bacteria.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Dinucleotide representation in bacterial genomes. (A) Violin plots with boxplots of the observed/expected ratio (O/E) for all dinucleotides. The gray horizontal hatched lines correspond to ratios of 0.78 and 1.23, which are generally considered as significant thresholds for dinucleotide depletion and enrichment, respectively (13,48). (B) Observed/expected ratio for CpG (upper panel) and TpA (lower panel) as a function of G + C content. Cubic smoothing splines are shown as solid red lines and the Pearson's correlation coefficient ρ is also reported along with their corresponding P-values. (C) Distribution of residuals for the O/E CpG and TpA models.
Figure 2.
Figure 2.
Dinucleotide ancestral state in bacteria. (A) Bacterial phylogenetic tree retrieved from the Genome Taxonomy database (see Materials and methods for details) with branches colored by ancestral state reconstruction of resCpG (left) and resTpA (right) values. Red points at tips indicate thermophilic/hyperthermophilic bacteria. (B) Boxplot of resCpG values grouped by growing temperature. PhylANOVA pairwise post-hoc tests after Holm's correction are reported above each comparison. Given that a only subset of bacteria from our dataset is present in the Genome Taxonomy database tree, phylogenetic ANOVA analysis was performed on 7228 mesophilic, 451 psychrophilic, and 406 thermophilic/hyperthermophilic genomes.
Figure 3.
Figure 3.
CpG representation and DNA bendability of thermophilic/hyperthermophilic genomes. (A) Box plot representation of normalized C-score grouped by growing temperature. Both temperature classes are composed of 406 genomes. Statistical difference among the two groups has been assessed using phylogenetic ANOVA analysis. (B) Linear models of normalized C-score as a function of O/E CpG residuals. Dots are colored based on the temperature classes and Pearson's correlation coefficient ρ is also reported. Regression lines are shown with confidence intervals. (C) Box plot representation of O/E CpG residuals grouped by thermophilic/hyperthermophilic bacteria that encode or do not encode the reverse gyrase (RG) enzyme. Statistical difference between the two groups was assessed using phylogenetic ANOVA analysis. (D) Box plot representation of normalized C-score grouped by thermophilic/hyperthermophilic bacteria that encode or do not encode the RG enzyme. Statistical difference between the two groups was assessed using phylogenetic ANOVA analysis.

Similar articles

Cited by

References

    1. Bird A.P., Taggart M.H. Variable patterns of total DNA and rDNA methylation in animals. Nucleic Acids Res. 1980; 8:1485–1497. - PMC - PubMed
    1. Burge C., Campbell A.M., Karlin S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Nat. Acad. Sci. USA. 1992; 89:1358–1362. - PMC - PubMed
    1. Gentles A.J., Karlin S. Genome-scale compositional comparisons in eukaryotes. Genome Res. 2001; 11:540–546. - PMC - PubMed
    1. Gonçalves-Carneiro D., Takata M.A., Ong H., Shilton A., Bieniasz P.D. Origin and evolution of the zinc finger antiviral protein. PLoS Pathog. 2021; 17:e1009545. - PMC - PubMed
    1. Karlin S., Burge C. Dinucleotide relative abundance extremes: A genomic signature. Trends Genet. 1995; 11:283–290. - PubMed

LinkOut - more resources