Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jun;18(6):949-56.
doi: 10.1101/gr.072322.107. Epub 2008 Apr 4.

Large-scale analysis of gene clustering in bacteria

Affiliations

Large-scale analysis of gene clustering in bacteria

Qingwu Yang et al. Genome Res. 2008 Jun.

Abstract

An important strategy to study operons and their evolution is to investigate clustering of related genes across multiple bacterial genomes. Although existing algorithms are available that can identify gene clusters across two or more genomes, very few algorithms are efficient enough to study gene clusters across hundreds of genomes. We observe that a querying strategy can be used to analyze gene clusters across a large number of genomes and develop an efficient algorithm to identify all related clusters on a genome from a given query cluster. We use this algorithm to study gene clustering in 400 bacterial genomes by starting from a well-characterized list of operons in Escherichia coli K12 and perform comparative analysis of operon occurrences, gene orientations, and rearrangements both within and across clusters. We show that important biological insights can be obtained by comparing results across these categories. A software program implementing the algorithm (GCQuery) and supplementary data containing detailed results are available at http://faculty.cs.tamu.edu/shsze/gcquery.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Algorithm GCQuery to find all related gene clusters on a linear chromosome c from a query cluster Q. GCQuery is available at http://faculty.cs.tamu.edu/shsze/gcquery.
Figure 2.
Figure 2.
Illustration of all clusters of size >1 on a linear chromosome c from a query cluster Q. Dashed lines denote related genes. It is possible that each gene in Q can be related to more than one gene in c and vice versa.
Figure 3.
Figure 3.
Performance of GCQuery (A), GeneTeams (B), and HomologyTeams (C) on B. subtilis subsp. subtilis str. 168 while fixing the BLAST E-value cutoff to 10−7 for defining related genes. For GeneTeams and HomologyTeams, the distance cutoff defines the maximum number of intervening genes and the maximum number of base pairs between adjacent genes in a predicted cluster, respectively.
Figure 4.
Figure 4.
Distribution of the occurrence rate of the 123 E. coli K12 operons in the 400 bacterial genomes.
Figure 5.
Figure 5.
Distribution of the occurrence rate of genes within significant clusters in S10, spc, and alpha operons in the 400 bacterial genomes.
Figure 6.
Figure 6.
Percentage of clusters in which all genes share the same orientation for different BLAST E-value cutoffs.
Figure 7.
Figure 7.
Distribution of the percentage of conserved neighboring gene pairs over all clusters.

References

    1. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J., Gish W., Miller W., Myers E.W., Lipman D.J., Miller W., Myers E.W., Lipman D.J., Myers E.W., Lipman D.J., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Bergeron A., Corteel S., Raffinot M., Corteel S., Raffinot M., Raffinot M. The algorithmic of gene teams. Lect. Notes in Comput. Sci. 2002;2452:464–476.
    1. Blasco F., Iobbi C., Ratouchniak J., Bonnefoy V., Chippaux M., Iobbi C., Ratouchniak J., Bonnefoy V., Chippaux M., Ratouchniak J., Bonnefoy V., Chippaux M., Bonnefoy V., Chippaux M., Chippaux M. Nitrate reductases of Escherichia coli: Sequence of the second nitrate reductase and comparison with that encoded by the narGHJI operon. Mol. Gen. Genet. 1990;222:104–111. - PubMed
    1. Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V., Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Plunkett G., Bloch C.A., Perna N.T., Burland V., Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Bloch C.A., Perna N.T., Burland V., Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Perna N.T., Burland V., Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Burland V., Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Glasner J.D., Rode C.K., Mayhew G.F., Rode C.K., Mayhew G.F., Mayhew G.F., et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1462. - PubMed
    1. Calabrese P.P., Chakravarty S., Vision T.J., Chakravarty S., Vision T.J., Vision T.J. Fast identification and statistical evaluation of segmental homologies in comparative maps. Bioinformatics. 2003;19:SI74–SI80. - PubMed

Publication types

LinkOut - more resources