Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug 29:1:e127.
doi: 10.7717/peerj.127. eCollection 2013.

A Markovian analysis of bacterial genome sequence constraints

Affiliations

A Markovian analysis of bacterial genome sequence constraints

Aaron D Skewes et al. PeerJ. .

Abstract

The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria.

Keywords: Bacteria; Markov model; Sequencing; Topology; rRNA.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Percent symmetric difference of each order transition tree relative to the 16S rRNA tree (A) and the zero-order transition tree (B).
The greatest change in symmetric difference between the 16S rRNA tree and the tree based on transition matrices occurs between 0th order and 3rd order, with only a very small change thereafter. Similarly, the greatest symmetric difference between the 0th order transition tree and higher-order trees becomes relatively asymptotic after the 3rd order.
Figure 2
Figure 2. Percent symmetric difference between subsequent orders of transition trees.
The symmetric difference between the subsequent order transition trees becomes relatively asymptotic after the 3rd∣4th order.
Figure 3
Figure 3. The symmetric difference between the 16S rRNA tree and the third-order transition tree.
Branches marked in red represent disagreement in topology between the trees.
Figure 4
Figure 4. A collection of Enterobacteriaceae consisting of Salmonella, Escherichia and Shigella as example of taxa which cluster similarly in the 16SrRNA and third-order transition trees.
The genus of interest appear in red in the radial cladogram. A list of the organisms is given, with species that are not included in the transition tree, but are included in the 16S rRNA tree in boldface type. A.macleodii_Deep_ecotype, H.baltica_ATCC_49814, I.loihiensis_L2TR, K.koreensis_DSM_16069, L.acidophilus_NCFM, L.brevis_ATCC_367, L.casei_ATCC_334, L.delbrueckii_bulgaricus, L.delbrueckii_bulgaricus_ATCC_BAA-365, L.fermentum_IFO_3956, L.gasseri_ATCC_33323, L.helveticus_DPC_4571, L.johnsonii_FI9785, L.johnsonii_NCC_533, L.plantarum, L.plantarum_JDM1, L.reuteri_DSM_20016, L.reuteri_F275_Kitasato, L.rhamnosus_GG, L.rhamnosus_Lc_705, L.sakei_23K, L.salivarius_UCC118, Marinomonas_MWYL1, M._mobilis_JLW8, P.profundum_SS9, P.necessarius_asymbioticus_QLW_P1DMWA_1, P.necessarius_STIR1, P.atlantica_T6c, P.haloplanktis_TAC125, P.arcticum_273-4, P.cryohalolentis_K5, Psychrobacter_PRwf-1, S.degradans_2-40, S.amazonensis_SB2B, Shewanella_ANA-3, S.baltica_OS155, S.baltica_OS185, S.baltica_OS195, S.baltica_OS223, S.denitrificans_OS217, S.frigidimarina_NCIMB_400, S.halifaxensis_HAW_EB4, S.loihica_PV-4, Shewanella_MR-4, Shewanella_MR-7, S.oneidensis, S.pealeana_ATCC_700345, S.piezotolerans_WP3, S.putrefaciens_CN-32, S.sediminis_HAW-EB3, Shewanella_W3-18-1, S.woodyi_ATCC_51908, T.crunogena_XCL-2, T.denitrificans_ATCC_33889, V.cholerae, V.cholerae_M66_2, V.cholerae_MJ_1236, V.cholerae_O395, Vibrio_Ex25, V.fischeri_ES114, V.harveyi_ATCC_BAA-1116, V.parahaemolyticus, V.splendidus_LGP32, V.vulnificus_CMCP6, V.vulnificus_YJ016, Y.enterocolitica_8081, Y.pestis_Angola, Y.pestis_Antiqua, Y.pestis_biovar_Microtus_91001, Y.pestis_CO92, Y.pestis_Nepal516, Y.pestis_Pestoides_F, Y.pseudotuberculosis_IP_31758, Y.pseudotuberculosis_IP32953, Y.pseudotuberculosis_PB1, Y.pseudotuberculosis_YPIII.
Figure 5
Figure 5. Genus Streptococcus appear in two distinct clusters in the third-order transition tree, but are assigned one cluster in the 16SrRNA tree.
The genus of interest appears in red in the radial cladogram. A list of the organisms is given. Group 1: S.equi_4047, S.equi_zooepidemicus, S.equi_zooepidemicus_MGCS10565, S.gordonii_Challis_substr_CH1, S.sanguinis_SK36, S.pneumoniae_70585, S.pneumoniae_JJA, S.pneumoniae_D39, S.pneumoniae_R6, S.pneumoniae_P1031, S.pneumoniae_G54, S.pneumoniae_Taiwan19F_14, S.pneumoniae_ATCC_700669, S.pneumoniae_CGSP14, S.pneumoniae_Hungary19A_6, S.pneumoniae_TIGR4, S.suis_05ZYH33, S.suis_98HAH33, S.suis_SC84, S.suis_P1_7, S.suis_BM407 Group 2: S. agalactiae_2603, S.agalactiae_NEM316, S.agalactiae_A909, S.dysgalactiae_equisimilis_GGS_124, S.pyogenes_M1_GAS, S.pyogenes_MGAS9429, S.pyogenes_MGAS10270, S.pyogenes_NZ131, S.pyogenes_MGAS10750, S.pyogenes_MGAS10394, S.pyogenes_MGAS8232, S.pyogenes_MGAS315, S.pyogenes_MGAS5005, S.pyogenes_MGAS6180, S.pyogenes_MGAS2096, S.pyogenes_Manfredo, S.pyogenes_SSI-1, S.thermophilus_CNRZ1066, S.thermophilus_LMG_18311, S.thermophilus_LMD-9, S.uberis_0140J, S.mutans.
Figure 6
Figure 6. A group of mostly aquatic bacteria that cluster together in the third-order transition tree, but are dispersed in the 16S rRNA tree.
The genus of interest appear in red in the radial cladogram. A list of the organisms is given with those that appear outside the cluster in the transition tree in boldface type. Shewanella_sediminis_HAW-EB3, Shewanella_woodyi_ATCC_51908, Alteromonas_macleodii_Deep_ecotype_, Saccharophagus_degradans_2-40, Pseudoalteromonas_haloplanktis_TAC125, Methylotenera_mobilis_JLW8, Psychrobacter_arcticum_273-4, Psychrobacter_cryohalolentis_K5, Psychrobacter_PRwf-1, Pseudoalteromonas_atlantica_T6c, Shewanella_ANA-3, Shewanella_MR-4, Shewanella_MR-7, Shewanella_baltica_OS155, Shewanella_baltica_OS185, Shewanella_baltica_OS195, Shewanella_baltica_OS223, Shewanella_oneidensis, Shewanella_putrefaciens_CN-32, Shewanella_W3-18-1, Shewanella_denitrificans_OS217, Shewanella_halifaxensis_HAW_EB4, Shewanella_pealeana_ATCC_700345, Shewanella_piezotolerans_WP3, Shewanella_frigidimarina_NCIMB_400, Photobacterium_profundum_SS9, Vibrio_cholerae, Vibrio_cholerae_M66_2, Vibrio_cholerae_O395, Vibrio_cholerae_MJ_1236, Vibrio_vulnificus_CMCP6, Vibrio_vulnificus_YJ016, Vibrio_Ex25, Vibrio_harveyi_ATCC_BAA-1116, Vibrio_parahaemolyticus, Vibrio_splendidus_LGP32, Marinomonas_MWYL1, Hirschia_baltica_ATCC_49814, Polynucleobacter_necessarius_asymbioticus_QLW_P1DMWA_1, Polynucleobacter_necessarius_STIR1, Idiomarina_loihiensis_L2TR, Yersinia_enterocolitica_8081, Yersinia_pestis_Angola, Yersinia_pestis_Nepal516, Yersinia_pestis_Antiqua, Yersinia_pestis_biovar_Microtus_91001, Yersinia_pestis_CO92, Yersinia_pseudotuberculosis_IP32953, Yersinia_pseudotuberculosis_PB1_, Yersinia_pseudotuberculosis_IP_31758, Yersinia_pseudotuberculosis_YPIII, Yersinia_pestis_Pestoides_F, Lactobacillus_brevis_ATCC_367, Lactobacillus_plantarum, Lactobacillus_plantarum_JDM1, Lactobacillus_casei_ATCC_334, Lactobacillus_rhamnosus_GG, Lactobacillus_rhamnosus_Lc_705, Kangiella_koreensis_DSM_16069, Thiomicrospira_crunogena_XCL-2, Vibrio_fischeri_ES114, Lactobacillus_sakei_23K, Lactobacillus_reuteri_DSM_20016, Shewanella_amazonensis_SB2B, Shewanella_loihica_PV-4, Lactobacillus_delbrueckii_bulgaricus, Thiomicrospira_denitrificans_ATCC_33889, Lactobacillus_acidophilus_NCFM.

Similar articles

Cited by

References

    1. Achtman M, Zurth K, Morelli G, Torrea G, Guiyoule A, Carniel E. Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proceedings of the National Academy of Sciences of the United States of America. 1999;96:14043–14048. doi: 10.1073/pnas.96.24.14043. - DOI - PMC - PubMed
    1. Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF. Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. Journal of Bacteriology. 2004;186:2629–2635. doi: 10.1128/JB.186.9.2629-2635.2004. - DOI - PMC - PubMed
    1. Almagor H. A Markov analysis of DNA sequences. Journal of Theoretical Biology. 1983;104:633–645. doi: 10.1016/0022-5193(83)90251-5. - DOI - PubMed
    1. Anderson TW, Goodman LA. Statistical-inference about Markov-chains. Annals of Mathematical Statistics. 1957;28:89–110. doi: 10.1214/aoms/1177707039. - DOI
    1. Audic S, Robert C, Campagna B, Parinello H, Claverie JM, Raoult D, Drancourt M. Genome analysis of Minibacterium massiliensis highlights the convergent evolution of water-living bacteria. PLoS Genetics. 2007;3:1454–1463. doi: 10.1371/journal.pgen.0030138. - DOI - PMC - PubMed

LinkOut - more resources