Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Mar 2;101(9):3160-5.
doi: 10.1073/pnas.0308653100. Epub 2004 Feb 18.

Trends between gene content and genome size in prokaryotic species with larger genomes

Affiliations

Trends between gene content and genome size in prokaryotic species with larger genomes

Konstantinos T Konstantinidis et al. Proc Natl Acad Sci U S A. .

Abstract

Although the evolution process and ecological benefits of symbiotic species with small genomes are well understood, these issues remain poorly elucidated for free-living species with large genomes. We have compared 115 completed prokaryotic genomes by using the Clusters of Orthologous Groups database to determine whether there are changes with genome size in the proportion of the genome attributable to particular cellular processes, because this may reflect both cellular and ecological strategies associated with genome expansion. We found that large genomes are disproportionately enriched in regulation and secondary metabolism genes and depleted in protein translation, DNA replication, cell division, and nucleotide metabolism genes compared to medium- and small-sized genomes. Furthermore, large genomes do not accumulate noncoding DNA or hypothetical ORFs, because the portion of the genome devoted to these functions remained constant with genome size. Traits other than genome size or strain-specific processes are reflected by the dispersion around the mean for cell functions that showed no correlation with genome size. For example, Archaea had significantly more genes in energy production, coenzyme metabolism, and the poorly characterized category, and fewer in cell membrane biogenesis and carbohydrate metabolism than Bacteria. The trends we noted with genome size by using Clusters of Orthologous Groups were confirmed by our independent analysis with The Institute for Genomic Research's Comprehensive Microbial Resource and Kyoto Encyclopedia of Genes and Genomes' Orthology annotation databases. These trends suggest that larger genome-sized species may dominate in environments where resources are scarce but diverse and where there is little penalty for slow growth, such as soil.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
COG functional categories that showed universal correlation with total ORFs in the genome. y axes are the percent of ORFs in the genome attributable to a specific COG category (graph title), and x axes are the total ORFs in the genome for each of the 99 fully sequenced bacterial genomes. Solid squares represent genomes that had a reasonable number of genes with homologs in the COG database, whereas open squares represent genomes that had either too many or too few genes with homologs in the database (outliers). Trendlines and R2 shown are for the solid squares. Archaeal genomes were not included because Archaea had significantly different genomic fractions from Bacteria in many functional categories.
Fig. 2.
Fig. 2.
Correlation among total number of ORFs in the genome, noncoding DNA, and genome size for prokaryotic genomes. (A) The total number of ORFs in the genome vs. the genome size for 115 completed prokaryotic genomes. (B) The total amount of noncoding DNA in the genome vs. genome size.
Fig. 3.
Fig. 3.
ABC transporter genes proportionately increase with genome size. y axis is the number of genes attributable to ABC transporter functions, and x axis is the total ORFs in the genome for each of the 99 fully sequenced bacterial genomes. Genomes that have disproportionately increased or decreased their number of ABC transporter genes are denoted on the graph.
Fig. 4.
Fig. 4.
Differences between Archaea and Bacteria in the relative usage of the genome. Bars represent the average from 34 bacterial and 12 archaeal genomes, which have between 1,500 and 3,500 ORFs (to avoid any genome size effect on the data). Only normalized genomes have been included (see text). Averages are statistically different by two-tailed t test, assuming unequal variances and 0.05 confidence level. Functional categories that had <2% of the genes in the genome are not shown.
Fig. 5.
Fig. 5.
Summary of the shifts in gene content with genome size in prokaryotic genomes. The bars represent the sum of the COG functional categories, which showed strong correlation with genome size and are involved in the same major cellular processes. Only normalized genomes (represented by solid squares in Fig. 1) have been included. Errors bars represent the standard deviation from the mean except for the last genome size class, where error bars represent data range due to a small number of normalized genomes in this class (three genomes).

References

    1. Andersson, S. & Kurland, C. (1998) Trends Microbiol. 6, 263-268. - PubMed
    1. Galperin, M. & Koonin, E. (1999) Genetica 106, 159-170. - PubMed
    1. Moran, N. (2002) Cell 108, 583-586. - PubMed
    1. Andersson, S., Zomorodipour, A., Andersson, J., Sicheritz-Pontent, T., Alsmark, U., Podowski, R., Naslund, A., Eriksson, A., Winkler, H. & Kurland, C. (1998) Nature 396, 109-110. - PubMed
    1. Fraser, C, Gocanye, J., White, O., Adams, M., Clayton, R., Fleischmann, R., Bult, D., Kerlavage, A., Sutton, G., Kelly, J., et al. (1995) Science 270, 397-403. - PubMed

Publication types