Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 24;52(7):3493-3509.
doi: 10.1093/nar/gkae155.

Origin, evolution, and maintenance of gene-strand bias in bacteria

Affiliations

Origin, evolution, and maintenance of gene-strand bias in bacteria

Malhar Atre et al. Nucleic Acids Res. .

Abstract

Gene-strand bias is a characteristic feature of bacterial genome organization wherein genes are preferentially encoded on the leading strand of replication, promoting co-orientation of replication and transcription. This co-orientation bias has evolved to protect gene essentiality, expression, and genomic stability from the harmful effects of head-on replication-transcription collisions. However, the origin, variation, and maintenance of gene-strand bias remain elusive. Here, we reveal that the frequency of inversions that alter gene orientation exhibits large variation across bacterial populations and negatively correlates with gene-strand bias. The density, distance, and distribution of inverted repeats show a similar negative relationship with gene-strand bias explaining the heterogeneity in inversions. Importantly, these observations are broadly evident across the entire bacterial kingdom uncovering inversions and inverted repeats as primary factors underlying the variation in gene-strand bias and its maintenance. The distinct catalytic subunits of replicative DNA polymerase have co-evolved with gene-strand bias, suggesting a close link between replication and the origin of gene-strand bias. Congruently, inversion frequencies and inverted repeats vary among bacteria with different DNA polymerases. In summary, we propose that the nature of replication determines the fitness cost of replication-transcription collisions, establishing a selection gradient on gene-strand bias by fine-tuning DNA sequence repeats and, thereby, gene inversions.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Gene-strand bias in bacteria. Distribution of gene-strand bias (calculated as the percentage of genes encoded on the leading strand) across bacterial clades. The X-axis denotes the clades, and the Y-axis represents the gene-strand bias (GSB) expressed as a percentage. The red horizontal line indicates a neutral scenario with no strand bias (50%). The median % GSB for each clade is shown on top and the number of species per clade is indicated at the bottom.
Figure 2.
Figure 2.
Variable gene inversion frequencies in bacterial populations. (A) Schematic describing GC-skew and phylogeny-based methodology to determine inversions (details in Materials and methods). Arrows on a representative genome-wide GC-skew map (left) indicate regions with local GC-skew disparity and gray shades highlight inversions based on gene synteny (right). The rows (alphabets) represent genomes while columns (numbers) indicate core gene clusters. (B) The inversion frequency for species with low (<70%; left Y-axis) and high (>70%; right Y-axis) gene-strand bias. Species analyzed (X-axis): Pae, Pseudomonas aeruginosa; Ngo, Neiserria gonorrhoeae; Bma, Burkholderia mallei; Yps, Yersinia pseudotuberculosis; Eal, Escherichia albertii; Sma, Serratia marsescens; Eco, Escherichia coli; Cul, Corynebacterium ulcerans; Cje, Campylobacter jejuni; Bbr, Bifidobacterium breve; Ban, Bacillus anthracis; Bsu, Bacillus subtilis; Ppo, Paenibacillus polymyxa; Sor, Streptococcus oralis; Efa, Enterococcus faecium; Cbu, Clostridium butyricum; Cpe, Clostridium perfringens; Mfl, Mesoplasma florum. (C) Inversion frequency (Y-axis) plotted against the gene-strand bias estimated by parsimony (X-axis). Linear model fit between the two variables is represented as blue trendline (here and in all following figures where applicable). Spearman's rank correlation coefficient (ρ) and the P-value (P) are presented on the plot.
Figure 3.
Figure 3.
Neither divergence times nor recombination potential underlie the inversion heterogeneity. (A) Inversion frequency plotted (Y-axis) against the evolutionary distance (X-axis) calculated as the median patristic distance from the phylogenetic tree of each species. Spearman's rank correlation coefficient (ρ) and the P-value (P) are displayed on the plot. (B) Distribution of IR lengths across the 18 analyzed species. The X-axis represents the species (labeled as in Figure 2) and the Y-axis denotes IR length (in bp) on a log scale.
Figure 4.
Figure 4.
The abundance and distribution of inverted repeats (IRs) are correlated with gene-strand bias. (A) Inverted repeat density (Y-axis) was calculated as the ratio of the median observed over the expected density. The expected IR density was calculated by shuffling the representative genome (500 iterations) for each species and the median was considered. (B) Schematic illustrating the inter-replichore IRs that can mediate inversions causing no strand-switch, and intra-replichore IRs that potentiate strand-switching inversions. Blue and red lines represent leading and lagging strands of replication, respectively. Gray boxes indicate the location of two copies of an IR. (C) Intra-replichore IR density (Y-axis) was calculated as the ratio of the median observed over the expected density. Expected value was obtained similar to (A). (D) The ratio of inter-replichore to intra-replichore IRs (Y-axis). The gray line denotes the neutral scenario. In (A), (C) and (D), X-axis represents the gene-strand bias. Spearman's rank correlation coefficient (ρ) and the P-value (P) are displayed on the plots.
Figure 5.
Figure 5.
Selection modulates inversion sizes through repeat distance. (A) Schematic explaining the scenario in which species with low gene-strand bias remain relatively unaffected by the size of inversion. Whereas, the species with high gene-strand bias can greatly be impacted by the larger inversions. Ratios indicate the percentage of genes on the leading (blue) and lagging (red) strand of replication. (B) Ridgeline plots of the distribution of the normalized repeat distances (X-axis) of intra-replichore IRs. The vertical line on each distribution represents median. Species denotations (Y-axis) are identical to Figure 2. (C) Inversion size (Y-axis) calculated as the median size of inversions normalized by the core genome size. (D) Inversion potential (Y-axis), calculated with the density, distribution, and size of intra-replichore IRs for every genome. In (C) and (D), X-axis represents the gene-strand bias. Spearman's rank correlation coefficient (ρ) and the P-value (P) are presented on the plots.
Figure 6.
Figure 6.
Inversion heterogeneity in bacterial kingdom. Inversion frequency plotted against the gene-strand bias across bacterial kingdom. Color of the data points on the scatter denotes the clade as presented in Figure 1. Spearman's rank correlation coefficient (ρ) and the P-value (P) are displayed on the plot.
Figure 7.
Figure 7.
Association of gene-strand bias and DNA polymerases. (A) Median inversion frequency (Y-axis), (B) Median recombination rate (Y-axis), and (C) Median inverted repeat density (Y-axis) were compared between species without (non-PolC) or with the polC gene (PolC), and their distributions are plotted. The central mark denotes the median, and the top and bottom edges indicate the first and third quartiles. The whiskers represent 1.5 times the interquartile range. Significance was calculated using Mann–Whitney U-test. (D) The ratio of the frequency of inversion to lagging strand over inversion to leading strand (Y-axis) plotted against the gene-strand bias (X-axis). Spearman's rank correlation coefficient (ρ) and the P-value (P) are indicated in the plot. The gray line indicates the neutral scenario. (E) Bacterial kingdom phylogeny and associated characters are represented as concentric circles. The phylogenetic clades are color-coded on the inner circle. In the middle circle presence (brown) or absence (light blue) of PolC is indicated. On the outermost circle, GSB is represented scaled from low (green) to high (blue) percentage. The height of the bar (gray) on the circumference corresponds to inversion frequency (same as Figure 6).
Figure 8.
Figure 8.
Proposed model describing the role of the nature of replication in evolution and preservation of gene-strand bias. The nature of replication underlined by the employment of the DNA polymerase at the replication fork determines the differential fitness cost of head-on collisions. This differential fitness cost presumably led to the evolution of high gene-strand bias in species with high fitness cost, whereas in species with lower fitness cost the evolution of low-to-moderate gene-strand bias is favored. This fitness cost disparity also enforces a gradient strength of selection on the inverted repeat characteristics (density, distance, and distribution) leading to modulation of inversion frequency and size promoting the maintenance of the evolved gene-strand bias.

Similar articles

Cited by

References

    1. French S. Consequences of replication fork movement through transcription units in vivo. Science. 1992; 258:1362–1365. - PubMed
    1. Liu B., Alberts B.M. Head-on collision between a DNA replication apparatus and RNA polymerase transcription complex. Science. 1995; 267:1131–1137. - PubMed
    1. Mirkin E.V., Mirkin S.M. Mechanisms of transcription-replication collisions in bacteria. Mol. Cell Biol. 2005; 25:888–895. - PMC - PubMed
    1. Pomerantz R.T., O’Donnell M. The replisome uses mRNA as a primer after colliding with RNA polymerase. Nature. 2008; 456:762–767. - PMC - PubMed
    1. Srivatsan A., Tehranchi A., MacAlpine D.M., Wang J.D. Co-orientation of replication and transcription preserves genome integrity. PLoS Genet. 2010; 6:e1000810. - PMC - PubMed

Publication types

Substances