Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 2;4(12):e8113.
doi: 10.1371/journal.pone.0008113.

Examination of genome homogeneity in prokaryotes using genomic signatures

Affiliations

Examination of genome homogeneity in prokaryotes using genomic signatures

Jon Bohlin et al. PLoS One. .

Abstract

Background: DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a "genomic signature." The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0(th) order Markov model) as well as genomic signatures normalized by smaller DNA words (1(st) and 2(nd) order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

Principal findings: Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

Conclusions: Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Oligonucleotide usage variance (OUV) in Bacillus cereus and Pirelulla sp.
The figure shows how tetranucleotide usage varies within the Bacillus cereus ATCC 14579 (grey line) and Rhodopirellula baltica SH 1 (black line) chromosomes. The vertical axis (OUV) is a measure of oligonucleotide usage variance. Higher OUV values indicate more biased tetranucleotide usage as compared to a randomly constructed DNA sequence with corresponding AT content. It can be seen that the R. baltica genome has, on average, more biased tetranucleotide usage than the B. cereus genome.
Figure 2
Figure 2. Genomics signature variance in Bacillus cereus and Pirelulla sp.
The figure shows how the genomic signature varies within one of the most homogeneous chromosomes, Rhodopirellula baltica SH 1 (black line), and within one of the most heterogeneous chromosomes, Bacillus cereus ATCC 14579 (grey line). The vertical axis representing PCH, gives a measure of how homogeneous a genome is. The higher the PCH value, the more homogeneous the chromosome. It can be seen that PCH is both higher and with less variation in the R. baltica genome as compared to the B. cereus genome. While R. baltica is a slow growing GC rich bacterium with a relatively large genome (7 mbp), B. cereus is a fast growing AT rich bacterium with a genome of approximately 5.5 mbp.
Figure 3
Figure 3. Oligonucleotide usage variance (OUV) based on ZOM, FOM and SOM models.
OUV scores based on ZOM (left), FOM (middle), and SOM (right) measures are found on the vertical axis, with each respective chromosome, sorted from left to right by increasing AT content, on the horizontal axis. Red lines indicate whole chromosome OUV scores, including both coding and non-coding section, while blue lines represent concatenated open reading frames. Lower values mean better OUV approximations. Dotted lines represent 99% prediction intervals.
Figure 4
Figure 4. Overview of Markov model based oligonucleotide approximations in prokaryotes.
OUV scores based on 0th, 1st and 2nd order Markov models (ZOM, FOM, and SOM respectively) are found on the vertical axis. Each chromosome is sorted with respect to increasing AT content from left to right along the horizontal axis. ZOMs (red line) approximate genomic tetranucleotide usage with nucleotide frequencies, while FOMs (green line) use genomic dinucleotide content in addition. The 2nd order Markov model (blue line) bases tetranucleotide frequency approximations on genomic di- and trinucleotide usage. Larger OUV values mean poorer approximations which is a consequence of more biased tetranucleotide usage.
Figure 5
Figure 5. Markov chain model based PCH scores in prokaryotes.
ZOM (left), FOM (middle) and SOM (right) PCH values (vertical axis) obtained for each chromosome sorted from left to right by increasing AT content (horizontal axis). The PCH scores show how the Markov chain based genomic signatures change, on average, within each chromosome. For all models we find that PCH scores are noticeably higher in coding regions (blue lines) than chromosomes, containing both coding and non-coding regions (red lines). Higher PCH values mean more homogeneous chromosomes while lower PCH means more heterogeneous chromosomes with respect to the corresponding Markov-chain based genomic signatures. Dotted lines represent 99% prediction intervals.
Figure 6
Figure 6. E. coli K-12 profiles based on ZOM, FOM and SOM PCH measures.
Plots of genomic signatures based on ZOM (red line), FOM (green line), or SOM (blue line) models compared with tetranucleotide-based signatures from a 10 kbp sliding window, overlapping every 5 kbp. Higher PCH (vertical axis) mean greater intra-chromosomal homogeneity. The low dips located close to genomic positions (horizontal axis) 2.1 mbp and 2.8 mbp indicate prophage DNA.

Similar articles

Cited by

References

    1. Ussery D, Soumpasis DM, Brunak S, Staerfeldt HH, Worning P, et al. Bias of purine stretches in sequenced chromosomes. Comput Chem. 2002;26(5):531–541. - PubMed
    1. Vaillant C, Audit B, Thermes C, Arneodo A. Formation and positioning of nucleosomes: Effect of sequence-dependent long-range correlated structural disorder. Eur Phys J E Soft Matter. 2006;19(3):263–277. - PubMed
    1. Garcia JA, Bartumeus F, Roche D, Giraldo J, Stanley HE, et al. Ecophysiological significance of scale-dependent patterns in prokaryotic genomes unveiled by a combination of statistic and genometric analyses. Genomics. 2008;91(6):538–543. - PubMed
    1. Ewens WJ, Grant GR. Statistical methods in bioinformatics. Springer 2001
    1. Lukashin AV, Borodovsky M. GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 1998;26(4):1107–1115. - PMC - PubMed

Publication types

Substances

LinkOut - more resources