Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context
- PMID: 11230160
- DOI: 10.1101/gr.gr-1619r
Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context
Abstract
Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is observed can provide valuable evolutionary and functional clues through multiple genome comparisons. A program for constructing gapped local alignments of conserved gene strings in two genomes was developed. The statistical significance of the local alignments was assessed using Monte Carlo simulations. Sets of local alignments were generated for all pairs of completely sequenced bacterial and archaeal genomes, and for each genome a template-anchored multiple alignment was constructed. In most pairwise genome comparisons, <10% of the genes in each genome belonged to conserved gene strings. When closely related pairs of species (i.e., two mycoplasmas) are excluded, the total coverage of genomes by conserved gene strings ranged from <5% for the cyanobacterium Synechocystis sp to 24% for the minimal genome of Mycoplasma genitalium, and 23% in Thermotoga maritima. The coverage of the archaeal genomes was only slightly lower than that of bacterial genomes. The majority of the conserved gene strings are known operons, with the ribosomal superoperon being the top-scoring string in most genome comparisons. However, in some of the bacterial-archaeal pairs, the superoperon is rearranged to the extent that other operons, primarily those subject to horizontal transfer, show the greatest level of conservation, such as the archaeal-type H+-ATPase operon or ABC-type transport cassettes. The level of gene order conservation among prokaryotic genomes was compared to the cooccurrence of genomes in clusters of orthologous genes (COGs) and to the conservation of protein sequences themselves. Only limited correlation was observed between these evolutionary variables. Gene order conservation shows a much lower variance than the cooccurrence of genomes in COGs, which indicates that intragenome homogenization via recombination occurs in evolution much faster than intergenome homogenization via horizontal gene transfer and lineage-specific gene loss. The potential of using template-anchored multiple-genome alignments for predicting functions of uncharacterized genes was quantitatively assessed. Functions were predicted or significantly clarified for approximately 90 COGs (approximately 4% of the total of 2414 analyzed COGs). The most significant predictions were obtained for the poorly characterized archaeal genomes; these include a previously uncharacterized restriction-modification system, a nuclease-helicase combination implicated in DNA repair, and the probable archaeal counterpart of the eukaryotic exosome. Multiple genome alignments are a resource for studies on operon rearrangement and disruption, which is central to our understanding of the evolution of prokaryotic genomes. Because of the rapid evolution of the gene order, the potential of genome alignment for prediction of gene functions is limited, but nevertheless, such predictions information significantly complements the results obtained through protein sequence and structure analysis.
Similar articles
-
Computational approaches for the analysis of gene neighbourhoods in prokaryotic genomes.Brief Bioinform. 2004 Jun;5(2):131-49. doi: 10.1093/bib/5.2.131. Brief Bioinform. 2004. PMID: 15260894
-
Genome trees constructed using five different approaches suggest new major bacterial clades.BMC Evol Biol. 2001 Oct 20;1:8. doi: 10.1186/1471-2148-1-8. BMC Evol Biol. 2001. PMID: 11734060 Free PMC article.
-
A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis.Nucleic Acids Res. 2002 Jan 15;30(2):482-96. doi: 10.1093/nar/30.2.482. Nucleic Acids Res. 2002. PMID: 11788711 Free PMC article.
-
Ancient origin of the tryptophan operon and the dynamics of evolutionary change.Microbiol Mol Biol Rev. 2003 Sep;67(3):303-42, table of contents. doi: 10.1128/MMBR.67.3.303-342.2003. Microbiol Mol Biol Rev. 2003. PMID: 12966138 Free PMC article. Review.
-
Comparative Genomics for Prokaryotes.Methods Mol Biol. 2018;1704:55-78. doi: 10.1007/978-1-4939-7463-4_3. Methods Mol Biol. 2018. PMID: 29277863 Review.
Cited by
-
Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data.Biol Direct. 2007 Nov 7;2:27. doi: 10.1186/1745-6150-2-27. Biol Direct. 2007. PMID: 17988398 Free PMC article.
-
Origin and evolution of the archaeo-eukaryotic primase superfamily and related palm-domain proteins: structural insights and new members.Nucleic Acids Res. 2005 Jul 15;33(12):3875-96. doi: 10.1093/nar/gki702. Print 2005. Nucleic Acids Res. 2005. PMID: 16027112 Free PMC article.
-
Comparative genomics of Archaea: how much have we learned in six years, and what's next?Genome Biol. 2003;4(8):115. doi: 10.1186/gb-2003-4-8-115. Epub 2003 Jul 16. Genome Biol. 2003. PMID: 12914651 Free PMC article. Review.
-
A complete twelve-gene deletion null mutant reveals that cyclic di-GMP is a global regulator of phase-transition and host colonization in Erwinia amylovora.PLoS Pathog. 2022 Aug 1;18(8):e1010737. doi: 10.1371/journal.ppat.1010737. eCollection 2022 Aug. PLoS Pathog. 2022. PMID: 35914003 Free PMC article.
-
Phyletic Distribution and Diversification of the Phage Shock Protein Stress Response System in Bacteria and Archaea.mSystems. 2022 Jun 28;7(3):e0134821. doi: 10.1128/msystems.01348-21. Epub 2022 May 23. mSystems. 2022. PMID: 35604119 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases