Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 15;30(12):i34-42.
doi: 10.1093/bioinformatics/btu282.

Inferring gene ontologies from pairwise similarity data

Affiliations

Inferring gene ontologies from pairwise similarity data

Michael Kramer et al. Bioinformatics. .

Abstract

Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.

Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.

Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).

Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
CliXO method. (A) An example ontology with genes A–H and terms 0–6. (B) Semantic similarity scores calculated from the ontology in (A). (C) Example showing reconstruction of the ontology in (A) from the similarity scores in (B). As the threshold is decreased, edges that equal or exceed the threshold are added to the graph. At each new threshold, maximal cliques in the graph, corresponding to terms are found and added to the inferred ontology
Fig. 2.
Fig. 2.
Inferring an ontology from semantic similarity. Precision–recall plots for ontologies inferred from GO BP Resnik semantic similarities. True positive inferred terms are identical (A) or aligned (B) to a GO BP term using permissive alignment
Fig. 3.
Fig. 3.
CliXO with Noise. Precision (A, C) and recall (B, D) for ontology inferred from Resnik semantic similarities beneath and including CC Biogenesis BP term (GO:0044085). True positives by strict alignment. Varying levels of Gaussian noise added to (A, B) or edges removed from (C, D) semantic similarity measure. Relative noise = 1/Median Signal to Noise ratio. Error bars represent standard error over 10 tested networks with random noise or edges removed per point
Fig. 4.
Fig. 4.
Inferring an Ontology from -omics data. (A, C, E) Pairwise similarity scores from data versus BP Resnik Semantic Similarity. A = Genetic Interaction (GI) profile Pearson correlation as provided by Costanzo et al. (2010); C = Gene expression (GE) Pearson correlation from Stanford Microarray Database (SMD); E = YeastNet v3. (B, D, F) Precision–recall plot for ontologies reconstructed using various methods and evaluated by strict alignment to GO BP. Data from GI profile correlation (B), GE profile correlation (D) or YeastNet v3 similarity (F). CliXO with varying α is shown by color change. At a given α parameter, the precision–recall curve for the CliXO ontology (terms ordered by weight at which they are inferred) is shown. NeXO results are shown with varying threshold edge weights for generating an unweighted input network
Fig. 5.
Fig. 5.
Stability of NeXO and CliXO with respect to parameters. (A) Both NeXO and CliXO algorithms run with varying numbers of edges from YeastNet v3. Reference is best performing result (20k for NeXO, 30k, α = 0.01 for CliXO). Percent similarity defined as number of terms either identical or aligned strictly divided by the number of terms in the smaller of the two ontologies produced. (B) CliXO run with varying levels of α. Reference is α = 0.01
Fig. 6.
Fig. 6.
CliXO Ontology from YeastNet v3. (A–C) Statistics about CliXO inferred ontology with α = 0.01, including number of parents (A) and number of children per non-terminal node (B) and distribution of term sizes inferred (C). (D) Number of terms in CliXO inferred ontology with α = 0.01 aligned strictly with FDR <5% against all three branches of GO (CC, MF, BP). Unaligned terms in outer circle

References

    1. Ahn YY, et al. Link communities reveal multiscale complexity in networks. Nature. 2010;466:761–764. - PubMed
    1. Alterovitz G, et al. Ontology engineering. Nat. Biotechnol. 2010;28:128–130. - PMC - PubMed
    1. Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Becker E, et al. Multifunctional proteins revealed by overlapping clustering in protein interaction network. Bioinformatics. 2012;28:84–90. - PMC - PubMed
    1. Carvunis A, Ideker T. Siri of the cell: what biology could learn from the iPhone. Cell. 2014;157:534–538. - PMC - PubMed

Publication types