Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Oct 7;33(17):5691-702.
doi: 10.1093/nar/gki866. Print 2005.

The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes

Affiliations

The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes

Ross Overbeek et al. Nucleic Acids Res. .

Abstract

The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Accumulation of complete archaeal and bacterial genome sequences at NCBI 1994–2004, and prediction of the release of genomes through 2010. Data from was extracted and plotted by year as shown with the crosses. Data from 2004–2010 is projected by the power law and is represented by open circles. At the current rate of growth, the 1000th complete microbial genome will be released in late 2007 or early 2008.
Figure 2
Figure 2
Subsystem and Populated Subsytem. The Histidine Degradation Subsystem was used as an example to demonstrate relevant terms. (A) The subsystem comprises of 7 functional roles (e.g. Histidine ammonia-lyase (EC 4.3.1.3), Urocanate hydratase (EC 4.2.1.49) etc.). Together with the spreadsheet it becomes the ‘Populated subsystem’. (B) The Subsystem Spreadsheet is populated with genes from 8 organisms (simplified from the original subsystem) where each row represents one organism and each column one of the functional roles of the subsystem. Genes performing the specific functional role in the respective organism populate the respective cell. Gray shading of cells indicates proximity of the respective genes on the chromosomes. (C) The Subsystem Diagram illustrates the populated subsystem: key intermediates (circles with roman numerals), connected by enzymes (boxes with abbreviations matching the spreadsheet abbreviations) and reactions (arrows). There are three distinct variants of Histidine Degradation presented in this populated subsystem. Variant 1 (green shading) is present in Caulobacter crescentus, Pseudomonas putida and Xanthomonas campestris. N-Formimino-l-Glutamate (IV) is converted to l-Glutamate (VI) via N-Formyl-l-Glutamate (V) by enzymatic activities of Formiminoglutamic iminohydrolase (EC 3.5.3.13) (ForI) and of N-formylglutamate deformylase (EC 3.5.1.68) (NfoD). Variant 2 (yellow shading) is present in Halobacterium sp., Deinococcus radiodurans and Bacillus subtilis. In this variant the conversion from intermediate IV to VI is performed by Formiminoglutamase (EC 3.5.3.8) (HutG). Variant 3 (blue shading) is present in Bacteroides thetaiotaomicron and Desulfotela psychrophila. Here the Glutamate formiminotransferase (EC 2.1.2.5) (GluF) performs the conversion from intermediate IV to VI.
Figure 3
Figure 3
Leucine Degradation and HMG-CoA Metabolism Subsystem. Functional roles, abbreviations, key intermediates and reactions in the pathway diagram are presented using the same conventions as in Figure 2. (A) Functional roles in the subsystem. (B) The Subsystem diagram shows the presence of genes assigned with respective functions for B.melitensis and G.metallireducens, using color-coded highlighting as explained in the panel. (C) Subsystem spreadsheet showing presence of genes with functions is shown by gene names for B.subtilis or by ‘+’ for all other genomes (modified from a regular SEED display showing all gene IDs). Highlighting by a matching color indicates proximity on the chromosome. (D) Clustering on the chromosome of genes involved in the Subsystem (large yellow cluster) demonstrated by alignment of the chromosomal contigs of respective genomes around a signature pathway gene, yngG. Homologous genes are shown by arrows with matching colors and numbers corresponding to functional roles in panel A. B.subtilis genes are marked by gene names. Other genes (not conserved within the cluster) are colored gray.
Figure 4
Figure 4
CoA Biosynthesis Subsystem. Functional roles, abbreviations, key intermediates and reactions in the pathway diagram are presented using the same conventions as in Figure 2. Background colors in the diagram illustrate the comparison of subsystem variants by highlighting functional roles asserted in two organisms: E.coli (yellow) and H.sapiens (blue). Shared functional roles are highlighted green. The lower panel is a modification of the subsystem spreadsheet. It shows a classification of major subsystem variants representing a substantially different reaction topology revealed by semi-automated graph analysis as described in (21). Selected genomes unambiguously associated with each variant are shown after variant description (e.g. De novo, complete/100). Patterns of functional roles which constitute each functional variant are generalized by: ‘+’, presence of a gene (for a given role) is required; ‘±’, optional; ‘?’, function is inferred by pathway analysis but a gene is unknown or ‘missing’ (i.e. can not be located by similarity). Typical sub-variants characterized by the same topology but relying on alternative (non-orthologous) forms of specific enzymes (e.g. PANK) are illustrated by the following genomes: E.coli K12 [NCBI taxonomy ID 83333.1], D.radiodurans R1 [243230.1], S.aureus subsp. aureus N315 [158879.1], S.oneidensis MR-1 [211586.1], G.metallireducens [28232.1], Saccharomyces cerevisiae [4932.1], P.aerophilum str. IM2 [178306.1], Streptococcus pneumoniae R6 [171101.1], Thermoanaerobacter tengcongensis [119072.1], H.sapiens [9606.2], B.aphidicola str. APS (Acyrthosiphon pisum) [107806.1], Treponema pallidum subsp. pallidum str. Nichols [243276.1] and Chlamydia trachomatis D/UW-3/CX [272561.1]. Genes assigned with respective functional roles are shown by SEED unique IDs for all illustrated genomes (except E.coli where common gene names are used). Matching background colors highlight genes that occur close to each other on the chromosome.

References

    1. Fleischmann R.D., Adams M.D., White O., Clayton R.A., Kirkness E.F., Kerlavage A.R., Bult C.J., Tomb J.F., Dougherty B.A., Merrick J.M., et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. - PubMed
    1. Haft D.H., Selengut J.D., Brinkac L.M., Zafar N., White O. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics. 2005;21:293–306. - PubMed
    1. Osterman A., Overbeek R. Missing genes in metabolic pathways: a comparative genomics approach. Curr. Opin. Chem. Biol. 2003;7:238–251. - PubMed
    1. Overbeek R., Larsen N., Smith W., Maltsev N., Selkov E. Representation of function: the next step. Gene. 1997;191:GC1–GC9. - PubMed
    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genet. 2000;25:25–29. - PMC - PubMed

Publication types