Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct 1;28(19):2502-8.
doi: 10.1093/bioinformatics/bts471. Epub 2012 Jul 26.

Bayesian ontology querying for accurate and noise-tolerant semantic searches

Affiliations

Bayesian ontology querying for accurate and noise-tolerant semantic searches

Sebastian Bauer et al. Bioinformatics. .

Abstract

Motivation: Ontologies provide a structured representation of the concepts of a domain of knowledge as well as the relations between them. Attribute ontologies are used to describe the characteristics of the items of a domain, such as the functions of proteins or the signs and symptoms of disease, which opens the possibility of searching a database of items for the best match to a list of observed or desired attributes. However, naive search methods do not perform well on realistic data because of noise in the data, imprecision in typical queries and because individual items may not display all attributes of the category they belong to.

Results: We present a method for combining ontological analysis with Bayesian networks to deal with noise, imprecision and attribute frequencies and demonstrate an application of our method as a differential diagnostic support system for human genetics.

Availability: We provide an implementation for the algorithm and the benchmark at http://compbio.charite.de/boqa/.

Contact: Sebastian.Bauer@charite.de or Peter.Robinson@charite.de

Supplementary information: Supplementary Material for this article is available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Principle idea of the approach in the context of clinical diagnosis. BOQA takes the data model derived from an attribute ontology and annotations together with a set of query terms to produce a ranked list of items. (A) A portion of the HPO with frequency-enhanced annotations to OMIM diseases. This information is used to define the data model of our application. (B) The high-level specification of the approach in the context of the diagnostic setting
Fig. 2.
Fig. 2.
A Bayesian network with two items annotated using an ontology with seven terms. Item 1 is annotated to term 3, and item 2 is annotated to terms 4 and 7. The annotations are modeled by edges from the item to the hidden layer. The edges within the hidden layer are directed from child to parent terms in the ontology and implement the annotation propagation rule. The edges within the query layer are directed in the opposite direction, and together with the one-to-one edges from hidden to query layer are used to model false-positive and false-negative queries. We also depict a particular configuration of the network, in which item 1 is active and term 6 forms the query. Thus, there is a false-negative event for term 3 and a false-positive event for term 6. Probabilities of involved non-trivial events are shown associated with the nodes of the query layer
Fig. 3.
Fig. 3.
Frequency-aware propagation. Here, I2 is active, whereas I1 is inactive. Given that, the probability that H4 is on is f2,4. The probability that H7 is on is f2,7. In addition, the frequencies between the diseases and all other terms are 0 so they can be omitted. Thus, there are four possible configurations of the model. The probability of configuration (A) is formula image, (B) is formula image, (C) is formula image, whereas for (D) it is formula image
Fig. 4.
Fig. 4.
Performance comparison using ROC and precision/recall analysis. The analysis was performed on 2368 diseases. For each disease, five patients were generated according to available frequency information. The true features of each patient were then obfuscated according to different levels of noise (α,β) as indicated. The maximum query size s was set to 6

References

    1. Alexa A, et al. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22:1600–1607. - PubMed
    1. Amberger J, et al. McKusick’s Online Mendelian Inheritance in Man (OMIM) Nucleic Acids Res. 2009;37:D793–D796. - PMC - PubMed
    1. Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Aymé S. Orphanet, an information site on rare diseases. Soins. 2003:46–47. - PubMed
    1. Bauer S, et al. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res. 2010;38:3523–3532. - PMC - PubMed

Publication types

LinkOut - more resources