Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 12;15(1):405.
doi: 10.1186/s12859-014-0405-z.

Methodology for the inference of gene function from phenotype data

Affiliations

Methodology for the inference of gene function from phenotype data

Joao A Ascensao et al. BMC Bioinformatics. .

Abstract

Background: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures.

Results: We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes.

Conclusions: We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ROC curve. Evaluation of the efficacy of a rule: if MPi then GOj, using the receiver operating characteristic.
Figure 2
Figure 2
A graphical representation of the plus and minus rule. The highlighted nodes represent the ‘successes’ identified by the corresponding rule. a, The plus rule-a pattern whereby if MP1 and MP2, then GO1. b, The minus rule-another pattern whereby if MP1 and not MP2, then GO1.
Figure 3
Figure 3
Example of rule instances. The highlighted nodes represent the ‘successes’ identified by the corresponding rule. a, Plus rule instance. b, Minus rule instance.
Figure 4
Figure 4
A graphical representation of the three rules emerging from the plus and minus rules. The highlighted nodes represent the ‘successes’ identified by the corresponding rule. a, The plus-plus rule-if MP1 and MP2 or MP3, then GO1. b, The minus-minus rule-if MP1 and not either MP2 or MP3, then GO1. c, The plus-minus rule-if MP1 and MP2 and not MP3, then GO1.
Figure 5
Figure 5
Comparison of the Positive Predictive Value for various p-value cutoffs of the composite rules for the reviewed annotation predictions. The PPV for the first three composite rules is markedly better for a given p-value than for the last two rules.

References

    1. Gruber TR. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition. 1993;5(2):199–220. doi: 10.1006/knac.1993.1008. - DOI
    1. Hill DP, Smith B, McAndrews-Hill MS, Blake JA. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics. 2008;9(Suppl 5):S2. doi: 10.1186/1471-2105-9-S5-S2. - DOI - PMC - PubMed
    1. Gene Ontology Consortium Gene Ontology annotations and resources. Nucl Acids Res. 2013;41:D530–D535. doi: 10.1093/nar/gks1050. - DOI - PMC - PubMed
    1. Gene Ontology Consortium Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Smith CL, Eppig JT. The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data. Mamm Genome. 2012;23(910):653–668. doi: 10.1007/s00335-012-9421-3. - DOI - PMC - PubMed

Publication types

LinkOut - more resources