Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;42(Database issue):D560-7.
doi: 10.1093/nar/gkt963. Epub 2013 Oct 27.

IMG 4 version of the integrated microbial genomes comparative analysis system

Affiliations

IMG 4 version of the integrated microbial genomes comparative analysis system

Victor M Markowitz et al. Nucleic Acids Res. 2014 Jan.

Abstract

The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
RNA-Seq data organization. (i) ‘Omics’ datasets generated can be accessed from ‘IMG Statistics’ on IMG’s front page, following the Experiments link available on the ‘IMG Statistics’ page. (ii) An RNA-Seq study is associated with samples and the number of genes expressed across all samples. (iii) Each sample is associated with the number of expressed genes, the total number of reads and the average number of reads per gene. (iv) An expressed gene is associated with a read count (total number of reads divided by the size of the gene) and normalized coverage (coverage for a gene in the experiment divided by the total number of reads in that experiment).
Figure 2.
Figure 2.
Biosynthetic clusters. (i) Genomes associated with biosynthetic clusters can be retrieved and examined using the ‘Genome Browser’. (ii) The number of biosynthetic clusters is provided in the ‘Genome Statistics’ section of the ‘Organism Detail’ page of a genome, together with a hyperlink to (iii) the list of biosynthetic clusters, whereby for each cluster the number of associated genes, the evidence type and the corresponding natural product are provided. (iv) A biosynthetic cluster can be examined using the ‘Biosynthetic Cluster Detail’ page, which includes information about the cluster. (v) ‘Natural Product List’ provides the list of the IMG genomes associated with natural products.
Figure 3.
Figure 3.
RNA-Seq data exploration. (i) The list of RNA-Seq studies associated with a genome can be accessed from its ‘Organism Details’, with each study associated with (ii) a list of RNA-Seq experiments (samples). Individual samples can be selected for further analysis, such as (iii) examining its expressed genes as a list or using the (iv) chromosome viewer. A sample can be also examined in the context of (v) pathways that have at least one enzyme associated with an expressed gene in the sample, whereby for each pathway (vi) enzymes are displayed with colors representing the level of expression for the associated genes; mousing over an enzyme shows the number of expressed genes associated with the enzyme.
Figure 4.
Figure 4.
RNA-Seq data comparison. (i) RNA-Seq sample comparison starts with the selection of samples of interest. (ii) ‘Pairwise Sample Analysis’ supports comparing samples in terms of up/downregulated genes, with (iii) a histogram preview helping setting the thresholds for comparison. (iv) The result of the comparison can be examined in terms of functions, whereby genes associated with KEGG pathways or COG functions are grouped together. (v) The strength of the association of gene expression between pairs of samples can be examined using ‘Spearman’s Rank Correlation’. (vi) ‘Linear Regression’ analysis helps estimate whether two samples are technical replicates. (vii) ‘Multiple Sample Analysis’ consists of clustering samples based on the abundance of expressed genes, using a variety of clustering methods. (viii) Clusters of samples can be examined in the context of pathways, whereby enzymes are displayed with colors representing the cluster.

References

    1. Pruitt KD, Tatusova T, Garth RB, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation. Nucleic Acids Res. 2012;40:D130–D135. - PMC - PubMed
    1. Pagani I, Liolios K, Jansson J, Chen IM, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC. The Genomes on Line Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012;40:D571–D579. - PMC - PubMed
    1. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrahi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013;41:D36–D42. - PMC - PubMed
    1. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209. - PMC - PubMed
    1. Edgar RC. PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics. 2007;8:18. - PMC - PubMed

Publication types