Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Apr;184(8):2260-72.
doi: 10.1128/JB.184.8.2260-2272.2002.

Evolutionary analysis by whole-genome comparisons

Affiliations
Comparative Study

Evolutionary analysis by whole-genome comparisons

Arvind K Bansal et al. J Bacteriol. 2002 Apr.

Abstract

A total of 37 complete genome sequences of bacteria, archaea, and eukaryotes were compared. The percentage of orthologous genes of each species contained within any of the other 36 genomes was established. In addition, the mean identity of the orthologs was calculated. Several conclusions result: (i) a greater absolute number of orthologs of a given species is found in larger species than in smaller ones; (ii) a greater percentage of the orthologous genes of smaller genomes is contained in other species than is the case for larger genomes, which corresponds to a larger proportion of essential genes; (iii) before species can be specifically related to one another in terms of gene content, it is first necessary to correct for the size of the genome; (iv) eukaryotes have a significantly smaller percentage of bacterial orthologs after correction for genome size, which is consistent with their placement in a separate domain; (v) the archaebacteria are specifically related to one another but are not significantly different in gene content from the bacteria as a whole; (vi) determination of the mean identity of all orthologs (involving hundreds of gene comparisons per genome pair) reduces the impact of errors in misidentification of orthologs and to misalignments, and thus it is far more reliable than single gene comparisons; (vii) however, there is a maximum amount of change in protein sequences of 37% mean identity, which limits the use of percentage sequence identity to the lower taxa, a result which should also be true for single gene comparisons of both proteins and rRNA; (viii) most of the species that appear to be specifically related based upon gene content also appear to be specifically related based upon the mean identity of orthologs; (ix) the genes of a majority of species considered in this study have diverged too much to allow the construction of all-encompassing evolutionary trees. However, we have shown that eight species of gram-negative bacteria, six species of gram-positive bacteria, and eight species of archaebacteria are specifically related in terms of gene content, mean identity of orthologs, or both.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Number of orthologous genes in genome pairs. Abbreviations are as defined in Table 1.
FIG. 2.
FIG. 2.
Percentage of orthologous genes of each species contained within the other genomes. Each row contains the percentage of orthologous genes on the left contained within the species indicated across the top. Abbreviations are as defined in Table 1.
FIG. 3.
FIG. 3.
Relationship between gene content and genome size. First, the numbers of orthologous genes in Fig. 1 were divided by the numbers of genes in Table 1, resulting in the percentage of orthologous genes shown in Fig. 2. The percentage of orthologous genes of the first genome of a species pair were then plotted versus the total number of genes in the second genome of the pair for 35 comparisons (because of its exceptionally large size, C. elegans was excluded for clarity). (A) Percentage of D. radiodurans genes in other genomes; (B) percentage of Buchnera sp. genes in other genomes; (C) percentage of E. coli genes in other genomes; (D) percentage of Methanobacterium thermoautotrophicus genes in other genomes. The dotted line is a fit to the data, whereas the solid lines arbitrarily connect the origin to either B. subtilis or P. aeruginosa. The archaebacteria are yellow, the related gram-positive species are blue, the related gram-negative species are green, yeast is black, and the remaining species are red. The two bacteria which appear to be related to the archaea in plot D are shown as open circles.
FIG. 4.
FIG. 4.
Relationship between the percentage of orthologous genes in other species and genome size. The slopes of the lines in the plots of Fig. 3 and similar bacterial comparisons were plotted versus the genome size of the first species.
FIG. 5.
FIG. 5.
Mean identity of orthologs. Distribution of the percentage identities of the 561 orthologous genes shared by E. coli and Buchnera sp. The solid line is a Gaussian fit to the data. The mean is 57.3%, and the standard deviation is 14.2. All such distributions were fitted, and the results are presented in Fig. 6.
FIG. 6.
FIG. 6.
Mean identity of orthologs in genome pairs. Numbers thought to be significant are in boldface. Abbreviations are as defined in Table 1.
FIG. 7.
FIG. 7.
Distribution of the mean identity of orthologous genes from Fig. 6. The solid line is a Gaussian fit to the data below 43% identity. The mean of the mean is 36.9%, and the standard deviation is 1.79. All data at or beyond two standard deviations are highlighted in Fig. 6 to indicate which are significant.

References

    1. Ainscough, R., et al. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012-2018. - PubMed
    1. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. - PubMed
    1. Andersson, S. G. E., A. Zomorodipour, J. O. Andersson, T. Sicheritz-Ponten, U. C. M. Alsmark, R. M. Podowski, A. K. Naslund, A. S. Eriksson, H. H. Winkler, and C. G. Kurland. 1998. The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133-140. - PubMed
    1. Bansal, A. K. 1999. An automated comparative analysis of 17 complete microbial genomes. Bioinformatics 15:900-908. - PubMed
    1. Bansal, A. K., P. Bork, and P. J. Stuckey. 1998. Automated pair-wise comparisons of microbial genomes. Math. Model. Sci. Comput. 9:1-23.

Publication types

Substances

LinkOut - more resources