Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jun;14(6):1036-42.
doi: 10.1101/gr.2231904.

Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli

Affiliations

Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli

Vincent Daubin et al. Genome Res. 2004 Jun.

Abstract

Differences in gene repertoire among bacterial genomes are usually ascribed to gene loss or to lateral gene transfer from unrelated cellular organisms. However, most bacteria contain large numbers of ORFans, that is, annotated genes that are restricted to a particular genome and that possess no known homologs. The uniqueness of ORFans within a genome has precluded the use of a comparative approach to examine their function and evolution. However, by identifying sequences unique to monophyletic groups at increasing phylogenetic depths, we can make direct comparisons of the characteristics of ORFans of different ages in the Escherichia coli genome, and establish their functional status and evolutionary rates. Relative to the genes ancestral to gamma-Proteobacteria and to those genes distributed sporadically in other prokaryotic species, ORFans in the E. coli lineage are short, A+T rich, and evolve quickly. Moreover, most encode functional proteins. Based on these features, ORFans are not attributable to errors in gene annotation, limitations of current databases, or to failure of methods for detecting homology. Rather, ORFans in the genomes of free-living microorganisms apparently derive from bacteriophage and occasionally become established by assuming roles in key cellular functions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of clade-specific genes at different phylogenetic depths within the γ-Proteobacteria. The topology of the tree is based on Lerat et al. (2003), and successive blue boxes (n0–n4, native) encompass the clades considered in the present study. Numbers of ORFans (yellow/red) and HOPs (white) in the E. coli MG1655 genome specific to n0–n4 are shown at the basal nodes of each clade. The number of native genes (n = 2049) corresponds to genes in the E. coli MG1655 genome that are present in at least one member of each clade. Species numbers of Bacteria and Archaea denote all those included in BLASTP searches.
Figure 2
Figure 2
Characteristics of ORFans (black circles) and HOPs (open circles) in γ-Proteobacterial clades of increasing phylogenetic depth. Clade designations (n0-n4) follow those shown in Figure 1, and dashed lines denote values for native genes. (A) Average size (in base pairs). (B) Average %G+C content at the third position of codons (G+C3). Bars represent one standard error.
Figure 3
Figure 3
Average Ka/Ks ratios for ORFans (black circles) and HOPs (open circles) restricted to clades of increasing phylogenetic depth (n2-n4). All calculations of Ka and Ks are based on E. coli and S. enterica orthologs. The dashed line corresponds to the average Ka /Ks value for native genes of E. coli and S. enterica. Bars represent one standard error.
Figure 4
Figure 4
Average G+C at the third position of codons (G+C3) for orthologous ORFans of different classes of genes (ORFans, HOPs, native) in E. coli (black) and S. enterica (open). Bars represent one standard error.

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. - PMC - PubMed
    1. Amiri, H., Davids, W., and Andersson, S.G. 2003. Birth and death of orphan genes in rickettsia. Mol. Biol. Evol. 20: 1575–1587. - PubMed
    1. Blattner, F.R., Plunkett III, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453–1474. - PubMed
    1. Bubunenko, M.G. and Subramanian, A.R. 1994. Recognition of novel and divergent higher plant chloroplast ribosomal proteins by Escherichia coli ribosome during in vivo assembly. J. Biol. Chem. 269: 18223–18231. - PubMed
    1. Charlebois, R.L., Clarke, G.D., Beiko, R.G., and St Jean, A. 2003. Characterization of species-specific genes using a flexible, web-based querying system. FEMS Microbiol. Lett. 225: 213–220. - PubMed

WEB SITE REFERENCES

    1. http://globin.cse.psu.edu/enterix; Percent Identity Plots on the EnteriX server.

Publication types

MeSH terms