Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Dec;14(12):2469-77.
doi: 10.1101/gr.3024704.

Computing prokaryotic gene ubiquity: rescuing the core from extinction

Affiliations
Comparative Study

Computing prokaryotic gene ubiquity: rescuing the core from extinction

Robert L Charlebois et al. Genome Res. 2004 Dec.

Abstract

The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Number of genes that are found (by having the same consensus gene name, see Methods) in at least two-thirds of prokaryotic genomes, and that are found in a random sample of x = 1 through x = 147 of these genomes. The point at x = 0 of y = 474 represents the number of genes found in at least 97 prokaryotes. For x > 0, means are reported for 10,000 random selections of x genomes. Small genomes were progressively deleted from the analysis in order to produce the series of curves shown. (B) As in A, but for selected clades of prokaryotes.
Figure 2.
Figure 2.
Number of genes that are shared by at least 80%–100% of prokaryotic genomes. The points at the extreme right of the All Categories curves represent the end points from Figure 1 (for consensus gene names, CGN), and from Table 1 (for reciprocal best matches, RBM), respectively. Toward the left are genes that are cumulatively shared by progressively fewer genomes. Also shown are consensus gene names by functional category, extrapolated from COG assignments (Tatusov et al. 1997).
Figure 3.
Figure 3.
Graphical representation of Table 4. The x-axis denotes breadth of distribution amongst bacterial phyla, whereas the y-axis indicates the mean number of orthologs (reciprocal best matches) shared at that breadth. (Vertical bars) SD; (dots) minima and maxima.
Figure 4.
Figure 4.
A mix-and-match model for prokaryotic genome evolution. Every cell needs genes for multiple functions, and new genomic lineages arise in evolution through mixing and matching of genes performing these different functions, by processes of replacement, including nonorthologous displacement (Koonin et al. 1996). The simplest hypothesis would be that all functions are equally subject to such exchange processes. For many functions, available genes include nonhomologs and even null entries (gene and function loss), indicated here by different shapes. Thus, for these functions, no genes or even gene families will likely appear to be shared among all genomes. For some informational functions especially (such as translation), displacement most often involves genes that, although evolutionarily distinct (as indicated by colors), are homologous (as shown by shape). Such genes will appear among those of the ubiquitous core.

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. - PMC - PubMed
    1. Boucher, Y., Douady, C.J., Papke, R.T., Walsh, D.A., Boudreau, M.E., Nesbø, C.L., Case, R.J., and Doolittle, W.F. 2003. Lateral gene transfer and the origins of prokaryotic groups. Annu. Rev. Genet. 37: 283-328. - PubMed
    1. Brochier, C., Bapteste, E., Moreira, D., and Philippe, H. 2002. Eubacterial phylogeny based on translational apparatus proteins. Trends Genet. 18: 1-5. - PubMed
    1. Brown, J.R., Douady, C.J., Italia, M.J., Marshall, W.E., and Stanhope, M.J. 2001. Universal trees based on large combined protein sequence data sets. Nat. Genet. 28: 281-285. - PubMed
    1. Charlebois, R.L., Clarke, G.D.P., Beiko, R.G., and St. Jean, A. 2003. Characterization of species-specific genes using a flexible, web-based querying system. FEMS Microbiol. Lett. 225: 213-220. - PubMed

Web site references

    1. http://www.neurogadgets.com/bws.php; The NeuroGadgets Inc. Bioinformatics Web Service (NGIBWS).

Publication types