Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 13;10(4):e0122802.
doi: 10.1371/journal.pone.0122802. eCollection 2015.

Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data

Affiliations

Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data

Hirokazu Chiba et al. PLoS One. .

Abstract

Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. RDF model of ortholog information based on OrthO.
(A) Hierarchical structure of classes and properties in OrthO. OrthO includes 12 classes (owl:Class) and 20 properties (15 of owl:ObjectProperty and 5 of owl:DatatypeProperty). (B) Schematic representation of RDF graph structure of ortholog information described using OrthO. The elliptical nodes represent instances of classes. The directed edges represent properties. The dotted lines represent possible links to other resources.
Fig 2
Fig 2. An example orthology relation and its RDF representation.
(A) A schematic illustration of an orthology relation from the OrthoXML documentation (http://orthoxml.org/0.3/orthoxml_doc_v0.3.html#trees). Here, each node is assigned a URI and a class that are required for RDF representation. The filled circles representing speciation events are assigned the orth:OrthologGroup class. (B) RDF representation (Turtle format) of the example shown in A.
Fig 3
Fig 3. The portal page of MBGD SPARQL Search.
Fig 4
Fig 4. Retrieval of ortholog information of a specific protein.
(A) Schematic diagram of the RDF graph structure related to the query in B. The elliptical nodes represent resources. Specifically, the shaded elliptical nodes where classes are shown in italics represent the instances of the classes. In the unshaded elliptical node, the URI of the resource is directly shown. (B) SPARQL query to get GO annotation of an ortholog group. The prefix declarations are omitted for readability; the full description of the SPARQL query is included in S1 Dataset. (C) Search results of the query shown in B.
Fig 5
Fig 5. Comparison of ortholog information from different data sources.
(A) Schematic diagram of the RDF graph structure related to the query in B. The elliptical nodes represent instances of classes. The rectangular nodes represent literals (integers in this example). (B) SPARQL query to compare orthologs between MBGD and eggNOG. The first line enables the inference based on sub-class and sub-property relations (see Methods). (C) Search results of the query shown in B.
Fig 6
Fig 6. Retrieval of phylogenetic patterns of orthologs related to a specific function.
(A) Schematic diagram of the RDF graph structure related to the queries in B and C. (B) SPARQL query to get MBGD clusters including members related to the GO term GO:0009288 (bacterial-type flagellum). (C) SPARQL query to obtain organisms that contain members of an ortholog group. (D) Search results of the query shown in B. (E) The results obtained from the queries shown in B and C visualized using R (the R source code is included in S1 Dataset). The number of target organisms in each phylum is shown in parenthesis. After obtaining the output from R, the phyla containing gram-positive bacteria (+) and genes functioning in the flagellar export system (*) are marked, and the blue line was added to represent clusters with relatively wide organismal distribution (in at least 16 phyla).

References

    1. Fitch WM. Distinguishing homologous from analogous proteins. Systematic zoology. 1970;19: 99–113. - PubMed
    1. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annual review of genetics. 2005;39: 309–38. - PubMed
    1. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America. 1999;96: 4285–8. - PMC - PubMed
    1. Uchiyama I, Mihara M, Nishide H, Chiba H. MBGD update 2013: the microbial genome database for exploring the diversity of microbial world. Nucleic acids research. 2013;41: D631–5. 10.1093/nar/gks1006 - DOI - PMC - PubMed
    1. Berners-Lee T, Hendler J. Publishing on the semantic web. Nature. 2001;410: 1023–4. - PubMed

Publication types