Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000;1(5):RESEARCH0009.
doi: 10.1186/gb-2000-1-5-research0009. Epub 2000 Nov 6.

Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs)

Affiliations

Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs)

D A Natale et al. Genome Biol. 2000.

Abstract

Background: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi.

Results: A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix.

Conclusions: Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A flow chart of the genome annotation process using COGs. NR is the Non-Redundant sequence database at the National Center for Biotechnology Information.
Figure 2
Figure 2
The main phylogenetic patterns for the predicted proteins encoded in six archaeal genomes. Af, Archaeoglobus fulgidus; Mt, Methanobacterium thermoautotrophicum, Pa, Pyrococcus abyssi; Mj, Methanococcus jannaschii; Ph, Pyrococcus horikoshii; Ap, Aeropyrum pernix. 1, members of COGs including all archaeal species; 2, members of COGs including a subset of archaeal species; 3, members of COGs that include no archaeal species other then the analyzed one; 4, not in COGs. The percentage of proteins in each category is indicated.
Figure 3
Figure 3
Classification of genomes by co-occurrence in the COGs. (a) A cluster dendrogram. (b) A neighbor-joining unrooted tree. For abbreviations, see the Materials and methods section.
Figure 4
Figure 4
COGs not represented in each of the archaeal species while including members of the remaining five species. For P. horikoshii and P. abyssi, the absence of the respective second pyrococcal species was allowed. For abbreviations, see the Materials and methods section.

Similar articles

Cited by

References

    1. Boguski MS. Biosequence exegesis. Science. 1999;286:453–455. - PubMed
    1. Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting function: from genes to genomes and back. J Mol Biol. 1998;283:707–725. - PubMed
    1. Galperin MY, Koonin EV. Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol. 2000;18:609–613. - PubMed
    1. Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C, Franchini A, Tamames J, Valencia A, Ouzounis C, Sander C. Automated genome sequence analysis and annotation. Bioinformatics. 1999;15:391–412. - PubMed
    1. Gaasterland T, Sensen CW. MAGPIE: automated genome interpretation. Trends Genet. 1996;12:76–78. - PubMed