Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009:2009:bap021.
doi: 10.1093/database/bap021. Epub 2009 Nov 25.

MicroScope: a platform for microbial genome annotation and comparative genomics

Affiliations

MicroScope: a platform for microbial genome annotation and comparative genomics

D Vallenet et al. Database (Oxford). 2009.

Abstract

The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The three components of the MicroScope platform. The MicroScope deployment diagram presents three software architecture components: (i) in green, the process management system based on jBPM framework which orchestrates all the analyses of the annotation pipeline, (ii) in red, the PkGDB and MicroCyc databases which respectively manage genomic and metabolic data, and (iii) in blue, the MaGe Web interface which is directly connected to the databases and allows users to browse and edit data.
Figure 2.
Figure 2.
Comparative genomic functionalities in MaGe. A query result of the RGPfinder tool is shown in (A). In this example, E. coli IAI1 is compared with 10 other E. coli strains. A total of 66 regions of genomic plasticity are predicted. These regions are summarized in a table that displays their chromosomal location, the presence of genomic island features, and a specificity score for each compared strains. A detailed view of the predicted regions is available as shown in (B) for the region GR19. This region contains a gene cluster (i.e. the paa-operon) coding for enzymes of the phenylacetate degradation pathway. As shown by the colour code (i.e. green for the presence of a homolog gene, red for the absence), only two others E. coli strains (K12 and HS) share this region with the IAI1 strain. The synteny break points between the E. coli core genome and this metabolic region can be visualized using the cartographic representation of the synteny results (C). On these maps, a rectangle represents a putative homolog in the compared genome and a group of rectangles of the same color indicates a conserved synteny. (D) Shows the ‘Metabolic Profile’ functionality. The metabolic networks of eleven E. coli strains are compared in respect to pathway completion. In this example, only MicroCyc degradation pathways are selected and the pathway completion threshold is set to 0.7. Results are summarized in a table which gives, for the 11 selected strains, completion values for each pathway. Results confirm that the phenylacetate degradation pathway is complete in only three E. coli strains (IAI1, K12 and HS).
Figure 3.
Figure 3.
‘Keyword search’ functionality in MaGe. The query is performed in two steps: (i) in the ‘gene annotation’ dataset, searching for R. solanacearum genes which contain the term ‘hypothetical protein’ (With—all of the words) in the ‘product’ field (section part 1). (ii) in the two datasets ‘TrEMBL EXP’ and ‘SwissProt EXP’ (see text for details), searching for genes of the previous query which are similar (identity at least 40% over the overall length of the two sequences) to protein entries of which the description (DE line) does not contain any of the words (Without—at least one word) ‘hypothetical protein UPF unknown uncharacterized’ (section part 2). The query (‘Explore’ and then ‘Explore more’) returned 56 R. solanacearum genes which have 20 blast hits in the ‘SwissProt EXP’ dataset and 72 in the ‘TrEMBL EXP’ dataset. The beginning of the TrEMBL list shown in the figure has been sorted by Identity %. The first result is the RSc1602 gene (annotated as ‘hypothetical protein’), similar to the TrEMBL entry Q44000 (81% identity) which is linked to a paper (PubMed = 8021225) published in 1994 and describing a pyruvvate dehydrogenase complex and a new type of dihydrolipoamide dehydrogenase in Alcaligenes eutrophus.
Figure 4.
Figure 4.
Missing enzymes in the Acinetobacter baylyi ADP1 purine degradation pathway. The genomic region ACIAD3536-3542 of A. baylyi contains seven genes which share conserved syntenies in several other microbial genomes. Two of them encode enzymes involved in the last two steps of the purine degradation pathway (KEGG metabolic map 230). After human expertise, candidate genes were validated for the four missing reactions (blue dashed arrows).

References

    1. Salzberg SL. Genome re-annotation: a wiki solution? Genome Biol. 2007;8:102. - PMC - PubMed
    1. Lima T, Auchincloss AH, Coudert E, et al. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 2009;37:D471–D478. - PMC - PubMed
    1. Klimke W, Agarwala R, Badretdin A, et al. The National Center for Biotechnology Information's; Protein Clusters Database. Nucleic Acids Res. 2009;37:D216–D223. - PMC - PubMed
    1. Overbeek R, Begley T, Butler RM, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–5702. - PMC - PubMed
    1. Markowitz VM, Szeto E, Palaniappan K, et al. The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res. 2008;36:D528–D533. - PMC - PubMed