. 2017 Aug 10;18(1):367.

doi: 10.1186/s12859-017-1777-7.

ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding

Joseph Guhlin¹, Kevin A T Silverstein², Peng Zhou³, Peter Tiffin⁴, Nevin D Young³

Affiliations

¹ Department of Plant and Microbial Biology, 140 Gortner Laboratory, 1479 Gortner Avenue, University of Minnesota, St. Paul, MN, 55108, USA. guhli007@umn.edu.
² Minnesota Supercomputing Institute, 599 Walter Library, 117 Pleasant St. SE, Minneapolis, MN, 55455, USA.
³ Department of Plant Pathology, 495 Borlaug Hall, 1991 Upper Buford Circle, St. Paul, MN, 55108, USA.
⁴ Department of Plant and Microbial Biology, 140 Gortner Laboratory, 1479 Gortner Avenue, University of Minnesota, St. Paul, MN, 55108, USA.

PMID: 28797229
PMCID: PMC5553995
DOI: 10.1186/s12859-017-1777-7

ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding

Joseph Guhlin et al. BMC Bioinformatics. 2017.

. 2017 Aug 10;18(1):367.

doi: 10.1186/s12859-017-1777-7.

Authors

Joseph Guhlin¹, Kevin A T Silverstein², Peng Zhou³, Peter Tiffin⁴, Nevin D Young³

Affiliations

¹ Department of Plant and Microbial Biology, 140 Gortner Laboratory, 1479 Gortner Avenue, University of Minnesota, St. Paul, MN, 55108, USA. guhli007@umn.edu.
² Minnesota Supercomputing Institute, 599 Walter Library, 117 Pleasant St. SE, Minneapolis, MN, 55455, USA.
³ Department of Plant Pathology, 495 Borlaug Hall, 1991 Upper Buford Circle, St. Paul, MN, 55108, USA.
⁴ Department of Plant and Microbial Biology, 140 Gortner Laboratory, 1479 Gortner Avenue, University of Minnesota, St. Paul, MN, 55108, USA.

PMID: 28797229
PMCID: PMC5553995
DOI: 10.1186/s12859-017-1777-7

Abstract

Background: Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data.

Results: The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations.

Conclusions: ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.

Keywords: Annotation; Comparative genomics; Data integration; Graph database; Non-model species.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Example of the internal structure of ODG as represented by Neo4J. Here we can see a PFAM domain (red) that has been identified in 2 *Glycine max* genes (Glyma…) and 1 *Medicago truncatula* gene (Medtr…). We can see that this PFAM domain is associated with the GO Terms, represented in yellow, cell differentiation, cytoplasm, and nucleus. The GO Term collenchyma cell differentiation is also a cell differentiation GO term, as determined from the imported definitions from the Gene Ontology consortium. Because of the relationships ODG is able to assign additional annotation to these genes based on a known protein domain family. The query was initiated by looking for genes which may be associated with collenchyma cell differentiation

**Fig. 2**
ODG provides a simple web-based configuration utility that uses algorithms to attempt to identify file types and pre-populate many fields

**Fig. 3**
Database dependency structure of ODG. Each data type is further annotated by those connected directly in the graph. For example, a proteome can be linked to UniPathway entries if InterProScan results are present. If both are present, then both can be queried. If all dependencies are present from “HMM Scan Results” to “UniPathway” then it becomes possible to query HMM Scan Results locations and identify nearby genes or proteins and if they have any domains or motifs linking them to UniPathway annotations

**Fig. 4**
Flexible queries allow searching for syntenic regions across species while allowing for gene deletions or insertions. These are the results of a query against the *rhg1* soybean locus found on chromosome 18. Another locus of similar genes and order is identified on chromosome 11, as well as in other species. In *P. trichocarpa* and *M. truncatula* an unrelated gene is identified breaking up the synteny. In *M. truncatula* there is also a copy of the third gene (orange), which does not break the queries ability to identify the closest syntenic and BLASTP matching region

**Fig. 5**
ODG generates a query interface using a web-based interface. a) This is the gene-level detail, primarily populated by gene definition entries as well as the IPR Terms, when available. b) Summarized here are the relationships attached to this gene node, and the labels of the nodes the relationships connect to. c) Gene Ontology (GO) terms that were identified for this gene from InterProScan. d) A summary of the BlastP hits for this gene’s predicted protein sequence, including to other species. Provided are the BLAST Score Ratio (BSR), percent identity, and the e-value output from the BLAST+ program

See this image and copyright information in PMC

References

1. Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V. MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 2004;32(Database issue):D393–D397. doi: 10.1093/nar/gkh011. - DOI - PMC - PubMed
1. Joshi T, Fitzpatrick MR, Chen S, Liu Y, Zhang H, Endacott RZ, et al. Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding. Nucleic Acids Res. 2014;42(Database issue):D1245–D1252. doi: 10.1093/nar/gkt905. - DOI - PMC - PubMed
1. Grant D, Nelson RT, Cannon SB, Shoemaker RC. SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res. 2010;38(Database issue):D843–D846. doi: 10.1093/nar/gkp798. - DOI - PMC - PubMed
1. Chen N, Harris TW, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, et al. WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res. 2005;33(Database issue):D383–D389. doi: 10.1093/nar/gki066. - DOI - PMC - PubMed
1. Neo4j: The World’s Leading Graph Database. http://neo4j.com/. Accessed 10 Mar 2017.

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding

Affiliations

ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Miscellaneous