Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb 25;9(2):e90191.
doi: 10.1371/journal.pone.0090191. eCollection 2014.

Functional abstraction as a method to discover knowledge in gene ontologies

Affiliations

Functional abstraction as a method to discover knowledge in gene ontologies

Alfred Ultsch et al. PLoS One. .

Abstract

Computational analyses of functions of gene sets obtained in microarray analyses or by topical database searches are increasingly important in biology. To understand their functions, the sets are usually mapped to Gene Ontology knowledge bases by means of over-representation analysis (ORA). Its result represents the specific knowledge of the functionality of the gene set. However, the specific ontology typically consists of many terms and relationships, hindering the understanding of the 'main story'. We developed a methodology to identify a comprehensibly small number of GO terms as "headlines" of the specific ontology allowing to understand all central aspects of the roles of the involved genes. The Functional Abstraction method finds a set of headlines that is specific enough to cover all details of a specific ontology and is abstract enough for human comprehension. This method exceeds the classical approaches at ORA abstraction and by focusing on information rather than decorrelation of GO terms, it directly targets human comprehension. Functional abstraction provides, with a maximum of certainty, information value, coverage and conciseness, a representation of the biological functions in a gene set plays a role. This is the necessary means to interpret complex Gene Ontology results thus strengthening the role of functional genomics in biomarker and drug discovery.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. ORA results and functional areas obtained with the CLASSIC abstraction methods.
Graphical representation of the specific ontology showing the polyhierarchy of functional annotations (GO terms) assigned to HHI gene set (G = 119, Table 1) and forming a directed acyclic graph (DAG). The figure was generated with the GeneTrail web-based analysis tool . Significant GO terms were identified using ORA, which resulted in 71 terms at a significance level of p = 1.0 · 10−2 and Bonferroni α correction (grey ellipses in which the observed number of member genes, the expected number of genes by chance and the p-value of the significance of the deviation from the expectations (Fisher’s exact test) are annotated). The CLASSIC p-value approach to the interpretation of ORA results is the selection of headline terms along descending statistical significance. When setting the p-value threshold at p = 10−20, eight headlines resulted (red ellipses). The CLASSIC detail approach is the selection of the leaves of each ontology, which with the present ORA parameters resulted in seven details (blue ellipses plus “sensory perception of sound, the latter colored red since also selected by the p-value method).
Figure 2
Figure 2. Graph of the Information value function Info(Ti) = −e • pi • ln(pi), pi = nG(Ti)/nG, where nG(Ti) denotes the number of genes of a set annotated to a term Ti and nG denotes the total number of genes in the set.
Derived from Shannon information , Info(Ti) measures the contribution of the annotations of Ti to the total (Shannon) information of an specific ontology. Specifically, In bioinformatics, IC(Ti) = −log(pi) measures the information content (IC) of a GO term, , if pi is the number of all genes annotated to Ti relative to all annotations in the GO. So Info(Ti) can be interpreted as weighted Information Content of a specific ontology. Info(Ti) = 0 if term Ti does not possess any annotations (pi = 0) and for the root of the ontology. Info(Ti) has its maximum Info(Ti) = 1 at a gene probability of 37%.
Figure 3
Figure 3. Functional abstraction of ORA results.
Graphical representation of the specific ontology showing the polyhierarchy of functional annotations (GO terms) assigned to HHI gene set (G = 119, Table 1). ORA resulted in 71 terms at a significance level of p = 1.0 · 10−2 and Bonferroni α correction (grey ellipses). The functional abstraction approach to ORA results uses as a main measure the degree of remarkableness, calculated as the product (AND) of certainty, i.e., how safely one can assume that a GO term described the given set of genes, and information, calculated as Shannon information. Among most remarkable terms (n = 8, red ellipses), immediate redundancy is eliminated by deleting all terms that are already presented by others. This resulted in functional areas (red ellipses with green margins) conferring a comprehensive set of headline terms characterizing the biological functions of the HHI gene set. Although the present data set was of limited complexity, greater data sets may result in the initial identification of more than the desired up to nine functional areas. In this case, the method of subsumption can be applied to reduce this number. In the present case, this would, for example, join “cellular developmental process” and “anatomical structure development” to the next upper remarkable GO term “developmental process” (orange margins). In the opposite case, if the number of functional areas is low and an increase may be desirable, detailization may be applied. In this case, the terms downstream the hierarchy with the next highest remarkability are chose. For example, “neurological system process” would be split into “sensory perception and “equilibrioception” (yellow margins), which along the hierarchy have the next highest value of remarkability following the initial term. Note that the intermediate terms have lower remarkability and are therefore not chosen (Table S1).

References

    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29. - PMC - PubMed
    1. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, et al. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32: D262–266. - PMC - PubMed
    1. Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, et al. (2007) GeneTrail–advanced gene set enrichment analysis. Nucleic Acids Res 35: W186–192. - PMC - PubMed
    1. Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21: 3587–3595. - PMC - PubMed
    1. Alexa A, Rahnenfuhrer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607. - PubMed

Publication types

LinkOut - more resources