Federated ontology-based queries over cancer data

doi:10.1186/1471-2105-13-S1-S9

. 2012 Jan 25;13 Suppl 1(Suppl 1):S9.

doi: 10.1186/1471-2105-13-S1-S9.

Federated ontology-based queries over cancer data

Alejandra González-Beltrán, Ben Tagger, Anthony Finkelstein

PMID: 22373043
PMCID: PMC3471355
DOI: 10.1186/1471-2105-13-S1-S9

Federated ontology-based queries over cancer data

Alejandra González-Beltrán et al. BMC Bioinformatics. 2012.

. 2012 Jan 25;13 Suppl 1(Suppl 1):S9.

doi: 10.1186/1471-2105-13-S1-S9.

Authors

Alejandra González-Beltrán, Ben Tagger, Anthony Finkelstein

PMID: 22373043
PMCID: PMC3471355
DOI: 10.1186/1471-2105-13-S1-S9

Abstract

Background: Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult.

Results: Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included.

Conclusions: To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures.

PubMed Disclaimer

Figures

**Figure 1**
**caBIG^®semantic infrastructure core services**. **Figure 1**: caGrid core services, and their corresponding APIs, matched with the different levelsofthe metadata hierarchy. At the syntactic level, caGrid counts with XML Schemas to indicate the data types shared on the grid. These schemas are maintained in the Global Model Exchange, a service acting as an XML schema registry. The structural metadata is conformed by UML models, which can be accessed using the caGrid Discovery API. A metadata registry, based on the ISO/IEC 11179 standard, is used to manage common data elements (CDEs). The metadata registry, called caDSR, can be accessed with a specific API. A CDE is composed of an object class, a property and a value domain. These components correspond to a UML class, UML attribute and the attribute's data type, respectively, and each of them is associated with a set of concepts from an ontology. These mappings between structural elements and concepts constitute the reference metadata. The concepts are part of the domain metadata, and in caBIG^®mainly belong to the NCI thesaurus ontology. The LexEVS API allows to access the available terminologies.

**Figure 2**
**caBIG^®semantic infrastructure and semantic layer built in our COnQueST system**. **Figure 2**: Different levels of metadata in the caBIG^®semantic infrastructure shown for two data sources that share a common data element (CDE). The CDE is annotated with concepts from the NCI thesaurus ontology. The top part of the diagram (above the dotted line) shows the ontologies built in the COnQueST system to facilitate ontology-based queries over caBIG^®data services.

**Figure 3**
**DCQL Use Case**. Figure 3: Sections of the UML models of the caBIO and PIR data services showing the classes annotated with concepts included in the second query use case. This diagram corresponds to a solution of the query reformulation process involving multiple data services.

**Figure 4**
**Use Case**. **Figure 4**: Section of the caBIO UML model representing the relationship between the SNP class, corresponding to *single nucletoide polymorphisms* and the *Chromosome* class. This section of the UML model is relevant for the first query use case, where the solution involves a single target data service.

**Figure 5**
**System Architecture**. **Figure 5**: The caGrid service-oriented architecture (bottom part) extended with novel semantic services for the generation of ontologies and querying and a bespoke user interface (shown in the upper part of the diagram)

**Figure 6**
**Screenshot of the browser tool in COnQueSt interface**. **Figure 6**: The browser tool in COnQueSt interface: the upper left panel shows the list of projects (or information models) available, the bottom left panel shows the concepts used to annotate the selected project, and the right panel allows to view the concepts definitions, including links to the NCI thesaurus browser. All panels have a searching facility: for instance, it is possible to search projects by their name.

**Figure 7**
**Screenshot of the search tool in COnQueSt interface**. **Figure 7**: COnQueSt search tool: when searching for a concept, the result shown includes the projects (or information models) with matching concepts as well as the concepts themselves. While the "search" button considers all the concepts containing the search criteria, the "I'm feeling lucky" button retrieves the concept that matches exactly the search criteria.

**Figure 8**
**Screenshot of the query builder in COnQueSt interface**. **Figure 8**: The COnQueSt query builder tool allows to search available concepts and to specify an association between them, to indicate that has a property specified by another concept or indicate a specific value.

**Figure 9**
**Screenshot of the results panel in COnQueSt interface**. **Figure 9**: The query results panel shows a table listing the properties of each result object.

**Figure 10**
**The path metrics**. **Figure 10**: Three box plot diagrams showing path metrics for each subset of information models: caDSR, caGrid and Info Models. The path metrics considered are, from left to right: the longest path, the average number of nodes per path and the average number of paths per journey.

**Figure 11**
**Ontology and modules, generation and inference times**. **Figure 11**: The box plot diagram on the left shows the generation times for the NCIt module ontology and the annotated UML ontology for the three groups of information models (caDSR, caGrid and Info Models). The box plot diagram on the left depicts the inference times for the UML ontology using Hermit and Pellet reasoners. Both diagrams use logarithmic scale.

**Figure 12**
**Query rewriting performance**. **Figure 12**: Times taken in each stage of the query reformulation process (parsing, UML extraction, path finding, MCC conversion and CQL conversion) at varying path lengths.

**Figure 13**
**Path finding performance**. **Figure 13**: Path-finding times for varying numbers of explanations, ranging from 1 to 5. Each explanation generates a path.

**Figure 14**
**Query reformulation stages**. **Figure 14**: The stages of query rewriting for both single and multiple target data services are depicted in blue. The form of the query at the different stages is represented in yellow and in red, we show the points of user interaction.

**Figure 15**
**Federated path finder**. **Figure 15**: Processes involved in finding paths in the information models when dealing with queries over multiple data services.

See this image and copyright information in PMC

Cited by

A unified structural/terminological interoperability framework based on LexEVS: application to TRANSFoRm.
Ethier JF, Dameron O, Curcin V, McGilchrist MM, Verheij RA, Arvanitis TN, Taweel A, Delaney BC, Burgun A. Ethier JF, et al. J Am Med Inform Assoc. 2013 Sep-Oct;20(5):986-94. doi: 10.1136/amiajnl-2012-001312. Epub 2013 Apr 9. J Am Med Inform Assoc. 2013. PMID: 23571850 Free PMC article.
Ontologies and Knowledge Graphs in Oncology Research.
Silva MC, Eugénio P, Faria D, Pesquita C. Silva MC, et al. Cancers (Basel). 2022 Apr 10;14(8):1906. doi: 10.3390/cancers14081906. Cancers (Basel). 2022. PMID: 35454813 Free PMC article. Review.
Electronic Health Record-Oriented Knowledge Graph System for Collaborative Clinical Decision Support Using Multicenter Fragmented Medical Data: Design and Application Study.
Shang Y, Tian Y, Lyu K, Zhou T, Zhang P, Chen J, Li J. Shang Y, et al. J Med Internet Res. 2024 Jul 5;26:e54263. doi: 10.2196/54263. J Med Internet Res. 2024. PMID: 38968598 Free PMC article.
Defragged Binary I Ching Genetic Code Chromosomes Compared to Nirenberg's and Transformed into Rotating 2D Circles and Squares and into a 3D 100% Symmetrical Tetrahedron Coupled to a Functional One to Discern Start From Non-Start Methionines through a Stella Octangula.
Castro-Chavez F. Castro-Chavez F. J Proteome Sci Comput Biol. 2012;2012(1):3. doi: 10.7243/2050-2273-1-3. J Proteome Sci Comput Biol. 2012. PMID: 23431415 Free PMC article.
Cancer bioinformatics: a new approach to systems clinical medicine.
Wu D, Rice CM, Wang X. Wu D, et al. BMC Bioinformatics. 2012 May 1;13:71. doi: 10.1186/1471-2105-13-71. BMC Bioinformatics. 2012. PMID: 22549015 Free PMC article. No abstract available.

See all "Cited by" articles

References

1. NCRI Informatics Initiative. http://www.cancerinformatics.org.uk/
1. caBIG® Programme. https://cabig.nci.nih.gov/
1. ONIX. http://www.ncri-onix.org.uk/
1. Saltz J, Oster S, Hastings S, Langella S, Kurc T, Sanchez W, Kher M, Manisundaram A, Shanbhag K, Covitz P. caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid. Bioinformatics. 2006;22:1910–1916. doi: 10.1093/bioinformatics/btl272. - DOI - PubMed
1. Tobias J, Chilukuri R, Komatsoulis GA, Mohanty S, Sioutos N, Warzel DB, Wright LW, Crowley RS. The CAP cancer protocols-a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med Inform Decis Mak. 2006;6:25–25. doi: 10.1186/1472-6947-6-25. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

[1] NCRI Informatics Initiative. http://www.cancerinformatics.org.uk/

[2] NCRI Informatics Initiative. http://www.cancerinformatics.org.uk/

[3] caBIG® Programme. https://cabig.nci.nih.gov/

[4] caBIG® Programme. https://cabig.nci.nih.gov/

[5] ONIX. http://www.ncri-onix.org.uk/

[6] ONIX. http://www.ncri-onix.org.uk/

[7] Saltz J, Oster S, Hastings S, Langella S, Kurc T, Sanchez W, Kher M, Manisundaram A, Shanbhag K, Covitz P. caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid. Bioinformatics. 2006;22:1910–1916. doi: 10.1093/bioinformatics/btl272. - DOI - PubMed

[8] Saltz J, Oster S, Hastings S, Langella S, Kurc T, Sanchez W, Kher M, Manisundaram A, Shanbhag K, Covitz P. caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid. Bioinformatics. 2006;22:1910–1916. doi: 10.1093/bioinformatics/btl272. - DOI - PubMed

[9] Tobias J, Chilukuri R, Komatsoulis GA, Mohanty S, Sioutos N, Warzel DB, Wright LW, Crowley RS. The CAP cancer protocols-a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med Inform Decis Mak. 2006;6:25–25. doi: 10.1186/1472-6947-6-25. - DOI - PMC - PubMed

[10] Tobias J, Chilukuri R, Komatsoulis GA, Mohanty S, Sioutos N, Warzel DB, Wright LW, Crowley RS. The CAP cancer protocols-a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med Inform Decis Mak. 2006;6:25–25. doi: 10.1186/1472-6947-6-25. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Federated ontology-based queries over cancer data

Federated ontology-based queries over cancer data

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources