IMG 4 version of the integrated microbial genomes comparative analysis system

Victor M Markowitz¹, I-Min A Chen, Krishna Palaniappan, Ken Chu, Ernest Szeto, Manoj Pillay, Anna Ratner, Jinghua Huang, Tanja Woyke, Marcel Huntemann, Iain Anderson, Konstantinos Billis, Neha Varghese, Konstantinos Mavromatis, Amrita Pati, Natalia N Ivanova, Nikos C Kyrpides

Affiliations

Affiliation

¹ Biological Data Management and Technology Center, Computational Research Division Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, 94720 USA and Department of Energy, Microbial Genome and Metagenome Program, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, 94598 USA.

PMID: 24165883
PMCID: PMC3965111
DOI: 10.1093/nar/gkt963

IMG 4 version of the integrated microbial genomes comparative analysis system

Victor M Markowitz et al. Nucleic Acids Res. 2014 Jan.

. 2014 Jan;42(Database issue):D560-7.

doi: 10.1093/nar/gkt963. Epub 2013 Oct 27.

Authors

Affiliation

¹ Biological Data Management and Technology Center, Computational Research Division Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, 94720 USA and Department of Energy, Microbial Genome and Metagenome Program, Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, 94598 USA.

PMID: 24165883
PMCID: PMC3965111
DOI: 10.1093/nar/gkt963

Abstract

The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

PubMed Disclaimer

Figures

**Figure 1.**
RNA-Seq data organization. (i) ‘Omics’ datasets generated can be accessed from ‘IMG Statistics’ on IMG’s front page, following the Experiments link available on the ‘IMG Statistics’ page. (ii) An RNA-Seq study is associated with samples and the number of genes expressed across all samples. (iii) Each sample is associated with the number of expressed genes, the total number of reads and the average number of reads per gene. (iv) An expressed gene is associated with a read count (total number of reads divided by the size of the gene) and normalized coverage (coverage for a gene in the experiment divided by the total number of reads in that experiment).

**Figure 2.**
Biosynthetic clusters. (i) Genomes associated with biosynthetic clusters can be retrieved and examined using the ‘Genome Browser’. (ii) The number of biosynthetic clusters is provided in the ‘Genome Statistics’ section of the ‘Organism Detail’ page of a genome, together with a hyperlink to (iii) the list of biosynthetic clusters, whereby for each cluster the number of associated genes, the evidence type and the corresponding natural product are provided. (iv) A biosynthetic cluster can be examined using the ‘Biosynthetic Cluster Detail’ page, which includes information about the cluster. (v) ‘Natural Product List’ provides the list of the IMG genomes associated with natural products.

**Figure 3.**
RNA-Seq data exploration. (i) The list of RNA-Seq studies associated with a genome can be accessed from its ‘Organism Details’, with each study associated with (ii) a list of RNA-Seq experiments (samples). Individual samples can be selected for further analysis, such as (iii) examining its expressed genes as a list or using the (iv) chromosome viewer. A sample can be also examined in the context of (v) pathways that have at least one enzyme associated with an expressed gene in the sample, whereby for each pathway (vi) enzymes are displayed with colors representing the level of expression for the associated genes; mousing over an enzyme shows the number of expressed genes associated with the enzyme.

**Figure 4.**
RNA-Seq data comparison. (i) RNA-Seq sample comparison starts with the selection of samples of interest. (ii) ‘Pairwise Sample Analysis’ supports comparing samples in terms of up/downregulated genes, with (iii) a histogram preview helping setting the thresholds for comparison. (iv) The result of the comparison can be examined in terms of functions, whereby genes associated with KEGG pathways or COG functions are grouped together. (v) The strength of the association of gene expression between pairs of samples can be examined using ‘Spearman’s Rank Correlation’. (vi) ‘Linear Regression’ analysis helps estimate whether two samples are technical replicates. (vii) ‘Multiple Sample Analysis’ consists of clustering samples based on the abundance of expressed genes, using a variety of clustering methods. (viii) Clusters of samples can be examined in the context of pathways, whereby enzymes are displayed with colors representing the cluster.

See this image and copyright information in PMC

References

1. Pruitt KD, Tatusova T, Garth RB, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation. Nucleic Acids Res. 2012;40:D130–D135. - PMC - PubMed
1. Pagani I, Liolios K, Jansson J, Chen IM, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC. The Genomes on Line Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012;40:D571–D579. - PMC - PubMed
1. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrahi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013;41:D36–D42. - PMC - PubMed
1. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209. - PMC - PubMed
1. Edgar RC. PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics. 2007;8:18. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

IMG 4 version of the integrated microbial genomes comparative analysis system

Affiliation

IMG 4 version of the integrated microbial genomes comparative analysis system

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources