Functional abstraction as a method to discover knowledge in gene ontologies

Alfred Ultsch¹, Jörn Lötsch²

Affiliations

¹ DataBionics Research Group, University of Marburg, Marburg, Germany.
² Institute of Clinical Pharmacology, Goethe - University, Frankfurt am Main, Germany ; Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Project Group Translational Medicine and Pharmacology TMP, Frankfurt am Main, Germany.

PMID: 24587272
PMCID: PMC3935416
DOI: 10.1371/journal.pone.0090191

Functional abstraction as a method to discover knowledge in gene ontologies

Alfred Ultsch et al. PLoS One. 2014.

. 2014 Feb 25;9(2):e90191.

doi: 10.1371/journal.pone.0090191. eCollection 2014.

Authors

Alfred Ultsch¹, Jörn Lötsch²

Affiliations

¹ DataBionics Research Group, University of Marburg, Marburg, Germany.
² Institute of Clinical Pharmacology, Goethe - University, Frankfurt am Main, Germany ; Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Project Group Translational Medicine and Pharmacology TMP, Frankfurt am Main, Germany.

PMID: 24587272
PMCID: PMC3935416
DOI: 10.1371/journal.pone.0090191

Abstract

Computational analyses of functions of gene sets obtained in microarray analyses or by topical database searches are increasingly important in biology. To understand their functions, the sets are usually mapped to Gene Ontology knowledge bases by means of over-representation analysis (ORA). Its result represents the specific knowledge of the functionality of the gene set. However, the specific ontology typically consists of many terms and relationships, hindering the understanding of the 'main story'. We developed a methodology to identify a comprehensibly small number of GO terms as "headlines" of the specific ontology allowing to understand all central aspects of the roles of the involved genes. The Functional Abstraction method finds a set of headlines that is specific enough to cover all details of a specific ontology and is abstract enough for human comprehension. This method exceeds the classical approaches at ORA abstraction and by focusing on information rather than decorrelation of GO terms, it directly targets human comprehension. Functional abstraction provides, with a maximum of certainty, information value, coverage and conciseness, a representation of the biological functions in a gene set plays a role. This is the necessary means to interpret complex Gene Ontology results thus strengthening the role of functional genomics in biomarker and drug discovery.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. ORA results and functional areas obtained with the CLASSIC abstraction methods.**
Graphical representation of the specific ontology showing the polyhierarchy of functional annotations (GO terms) assigned to HHI gene set (G = 119, Table 1) and forming a directed acyclic graph (DAG). The figure was generated with the GeneTrail web-based analysis tool . Significant GO terms were identified using ORA, which resulted in 71 terms at a significance level of p = 1.0 · 10⁻² and Bonferroni α correction (grey ellipses in which the observed number of member genes, the expected number of genes by chance and the p-value of the significance of the deviation from the expectations (Fisher’s exact test) are annotated). The CLASSIC p-value approach to the interpretation of ORA results is the selection of headline terms along descending statistical significance. When setting the p-value threshold at p = 10⁻²⁰, eight headlines resulted (red ellipses). The CLASSIC detail approach is the selection of the leaves of each ontology, which with the present ORA parameters resulted in seven details (blue ellipses plus “sensory perception of sound, the latter colored red since also selected by the p-value method).

Figure 2. Graph of the Information value function *Info(T_i) = −e • p_i • ln(p_i)*, *p_i = n_G(Ti)/n_G*, where *n_G(Ti)* denotes the number of genes of a set annotated to a term *T_i* and *n_G* denotes the total number of genes in the set.
Derived from Shannon information , *Info(T_i)* measures the contribution of the annotations of T_i to the total (Shannon) information of an specific ontology. Specifically, In bioinformatics, *IC(T_i) = −log(p_i)* measures the information content (IC) of a GO term, , if *p_i* is the number of all genes annotated to *T_i* relative to all annotations in the GO. So *Info(T_i)* can be interpreted as weighted Information Content of a specific ontology. *Info(T_i)* = 0 if term *T_i* does not possess any annotations (*p_i* = 0) and for the root of the ontology. *Info(T_i)* has its maximum *Info(T_i)* = 1 at a gene probability of 37%.

**Figure 3. Functional abstraction of ORA results.**
Graphical representation of the specific ontology showing the polyhierarchy of functional annotations (GO terms) assigned to HHI gene set (G = 119, Table 1). ORA resulted in 71 terms at a significance level of p = 1.0 · 10⁻² and Bonferroni α correction (grey ellipses). The functional abstraction approach to ORA results uses as a main measure the degree of remarkableness, calculated as the product (AND) of certainty, i.e., how safely one can assume that a GO term described the given set of genes, and information, calculated as Shannon information. Among most remarkable terms (n = 8, red ellipses), immediate redundancy is eliminated by deleting all terms that are already presented by others. This resulted in functional areas (red ellipses with green margins) conferring a comprehensive set of headline terms characterizing the biological functions of the HHI gene set. Although the present data set was of limited complexity, greater data sets may result in the initial identification of more than the desired up to nine functional areas. In this case, the method of subsumption can be applied to reduce this number. In the present case, this would, for example, join “cellular developmental process” and “anatomical structure development” to the next upper remarkable GO term “developmental process” (orange margins). In the opposite case, if the number of functional areas is low and an increase may be desirable, detailization may be applied. In this case, the terms downstream the hierarchy with the next highest remarkability are chose. For example, “neurological system process” would be split into “sensory perception and “equilibrioception” (yellow margins), which along the hierarchy have the next highest value of remarkability following the initial term. Note that the intermediate terms have lower remarkability and are therefore not chosen (Table S1).

See this image and copyright information in PMC

References

1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29. - PMC - PubMed
1. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, et al. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32: D262–266. - PMC - PubMed
1. Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, et al. (2007) GeneTrail–advanced gene set enrichment analysis. Nucleic Acids Res 35: W186–192. - PMC - PubMed
1. Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21: 3587–3595. - PMC - PubMed
1. Alexa A, Rahnenfuhrer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Functional abstraction as a method to discover knowledge in gene ontologies

Affiliations

Functional abstraction as a method to discover knowledge in gene ontologies

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources