Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 10;18(1):367.
doi: 10.1186/s12859-017-1777-7.

ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding

Affiliations

ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding

Joseph Guhlin et al. BMC Bioinformatics. .

Abstract

Background: Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data.

Results: The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations.

Conclusions: ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.

Keywords: Annotation; Comparative genomics; Data integration; Graph database; Non-model species.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Example of the internal structure of ODG as represented by Neo4J. Here we can see a PFAM domain (red) that has been identified in 2 Glycine max genes (Glyma…) and 1 Medicago truncatula gene (Medtr…). We can see that this PFAM domain is associated with the GO Terms, represented in yellow, cell differentiation, cytoplasm, and nucleus. The GO Term collenchyma cell differentiation is also a cell differentiation GO term, as determined from the imported definitions from the Gene Ontology consortium. Because of the relationships ODG is able to assign additional annotation to these genes based on a known protein domain family. The query was initiated by looking for genes which may be associated with collenchyma cell differentiation
Fig. 2
Fig. 2
ODG provides a simple web-based configuration utility that uses algorithms to attempt to identify file types and pre-populate many fields
Fig. 3
Fig. 3
Database dependency structure of ODG. Each data type is further annotated by those connected directly in the graph. For example, a proteome can be linked to UniPathway entries if InterProScan results are present. If both are present, then both can be queried. If all dependencies are present from “HMM Scan Results” to “UniPathway” then it becomes possible to query HMM Scan Results locations and identify nearby genes or proteins and if they have any domains or motifs linking them to UniPathway annotations
Fig. 4
Fig. 4
Flexible queries allow searching for syntenic regions across species while allowing for gene deletions or insertions. These are the results of a query against the rhg1 soybean locus found on chromosome 18. Another locus of similar genes and order is identified on chromosome 11, as well as in other species. In P. trichocarpa and M. truncatula an unrelated gene is identified breaking up the synteny. In M. truncatula there is also a copy of the third gene (orange), which does not break the queries ability to identify the closest syntenic and BLASTP matching region
Fig. 5
Fig. 5
ODG generates a query interface using a web-based interface. a) This is the gene-level detail, primarily populated by gene definition entries as well as the IPR Terms, when available. b) Summarized here are the relationships attached to this gene node, and the labels of the nodes the relationships connect to. c) Gene Ontology (GO) terms that were identified for this gene from InterProScan. d) A summary of the BlastP hits for this gene’s predicted protein sequence, including to other species. Provided are the BLAST Score Ratio (BSR), percent identity, and the e-value output from the BLAST+ program

Similar articles

Cited by

References

    1. Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V. MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 2004;32(Database issue):D393–D397. doi: 10.1093/nar/gkh011. - DOI - PMC - PubMed
    1. Joshi T, Fitzpatrick MR, Chen S, Liu Y, Zhang H, Endacott RZ, et al. Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding. Nucleic Acids Res. 2014;42(Database issue):D1245–D1252. doi: 10.1093/nar/gkt905. - DOI - PMC - PubMed
    1. Grant D, Nelson RT, Cannon SB, Shoemaker RC. SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res. 2010;38(Database issue):D843–D846. doi: 10.1093/nar/gkp798. - DOI - PMC - PubMed
    1. Chen N, Harris TW, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, et al. WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res. 2005;33(Database issue):D383–D389. doi: 10.1093/nar/gki066. - DOI - PMC - PubMed
    1. Neo4j: The World’s Leading Graph Database. http://neo4j.com/. Accessed 10 Mar 2017.