Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan;38(Database issue):D382-90.
doi: 10.1093/nar/gkp887. Epub 2009 Oct 28.

The integrated microbial genomes system: an expanding comparative analysis resource

Affiliations

The integrated microbial genomes system: an expanding comparative analysis resource

Victor M Markowitz et al. Nucleic Acids Res. 2010 Jan.

Abstract

The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at http://img.jgi.doe.gov.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Genome Browser and Search Tools. The ‘Genome Browser’ (i) initially displays the three genome domains, whereby the genome display can be modified using the ‘Open All’ and ‘Close All’ options or (ii) using the open/close icons available at each level of the tree. (iii) Genomes can be selected either individually using the select boxes associated with each strain, or in groups using the green dot ‘select all’ icons available at each level of the tree. Metadata genome selection is provided by (iv) a ‘Metadata Categories’ based genome classification and (v) a ‘Genome Search’ tool based on a variety of metadata attributes.
Figure 2.
Figure 2.
Gene Cassette Search Tools. ‘Cassette Search’ allows users to find genes that are part of chromosomal cassettes involving specific protein clusters. First, users (i) select the protein cluster underlying the cassettes, the protein cluster identifier for the search, the logical operator used for the search expression and the order of presenting the search results. The search is carried out across all the genomes in IMG (default) or can be limited only to a subset of genomes using various filters or selecting genomes from the ‘Genome List’. (ii) The ‘Cassette Search Result’ lists the genes that satisfy the search condition, together with the identifiers of the cassettes they are part of, their associated protein cluster identifiers and names, and their genomes. (iii) The cassette identifiers provide links to the ‘Chromosomal Cassette’ details page. (iv) The ‘Phylogenetic Profiler for Gene Cassettes’ allows users to find genes that are part of a gene cassette in a query genome and are part of related gene cassettes in other genomes: users select the query genome by using the associated radio button in the ‘Find Genes In’ column, the protein cluster used for correlating gene cassettes, and the genomes for gene cassette comparisons with the query genome by using the associated radio buttons in the ‘Collocated In’. (v) The ‘Phylogenetic Profiler for Gene Cassette Results’ starts with a summary of the results, including a table with the first column listing the size of the groups of collocated genes in the query genome and the second column listing the number of such groups conserved across the other genomes involved in the selection. The Details part of the results consists of a table that displays groups of collocated genes in each chromosomal cassette in the query genome that satisfy the search criterion. (vi) The conserved part of a chromosomal cassette involving an individual gene in the query genome can be examined using the links provided in the ‘Conserved Neighborhood Viewer Centered on this Gene’ column of results table.
Figure 3.
Figure 3.
Phylogenetic distribution of genes and putative horizontally transferred genes. The ‘Phylogenetic Distribution of Genes’ is available as part of a genome’s Organism Details and (i) displays the distribution of best BLAST hits of protein-coding genes in the genome as a histogram: counts correspond to the number of genes that have best BLASTp hits to proteins of other genomes in a specific phylum or class with >90% identity (right column), 60–90% identity (middle column) and 30–60% identity (left column). Gene counts in the histogram are linked to the lists of genes in the selected genome that have best BLAST hit in a certain phylum/class with specified percent identity. ‘Putative Horizontally Transferred Genes’ allows users to explore genes in a query genome that are likely horizontally transferred via (ii) two lists of genes: genes with best hits to genes of genomes within a phylogenetic group (domain, phylum, class, etc.) that is different than the analogous group the query genome belongs to, and genes with best hits to genomes within a phylogenetic group that is different than the analogous group the query genome belongs to, and no hits to genes of genomes within the same phylogenetic group as the group the query genome belongs to. (iii) M. thermophila PT has two genes with best hits to bacterial genomes and no hits to other archaeal genomes, which may indicate a higher likelihood of being horizontally transferred from bacterial genomes.
Figure 4.
Figure 4.
Function Profile Tools. (i) The ‘Abundance Profile Overview’ allows users to compare genomes across all the terms of a functional or protein family. Users select the type of format for displaying the results (‘Heat Map’ or ‘Matrix’), protein/functional families (COG, Pfam, TIGRfam, Enzyme), normalization method and a set of genomes. (ii) If the ‘Matrix’ option is selected, the abundance of protein/functional families is displayed in a tabular format, with each row corresponding to a family and each cell containing the number of genes associated with a family for a specific genome. (iii) The ‘Function Profile’ allows users to compare genomes across functional or protein family terms selected using the ‘Function Cart’. (iii) The result of a ‘Function Profile’ is displayed in a tabular format similar to the ‘Matrix’ format of the ‘Abundance Profile Overview’. Users can click on a cell of an ‘Abundance Profile Overview’ or ‘Function Profile’ result in order to retrieve the list of genes assigned to a particular family in a genome. For profiles involving enzymes, a zero abundance (‘missing’) enzyme leads to (iv) the ‘Find Candidate Genes for Missing Function’ tool that allows users to find candidate genes of a target genome that could be associated with the missing enzyme. The search can be conducted across all IMG genomes, across a subset of genomes within a certain domain/phyla/class, or only across the selected genomes. The search can be based on homologs, orthologs or KO terms for finding genes that could be associated with the ‘missing’ enzyme. (v) The result of the search for candidate genes consists of a list of genes that can be selected and included into the ‘Gene Cart’.

References

    1. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts, and proteins. Nucleic Acid Res. 2007;35:D61–D65. - PMC - PubMed
    1. Liolios K, Mavrommatis K, Tavernarakis N, Kyrpides N. The genomes online database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2008;36:D475–D479. - PMC - PubMed
    1. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209. - PMC - PubMed
    1. Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP, and related tools. Nat. Protocols. 2007;2:953–971. - PubMed
    1. Moller S, Croning MDR, Apweiler R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics. 2001;17:646–653. - PubMed

Publication types