Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 1;36(10):3207-3214.
doi: 10.1093/bioinformatics/btaa106.

SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology

Affiliations

SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology

Rashmie Abeysinghe et al. Bioinformatics. .

Abstract

Motivation: The Gene Ontology (GO) is the unifying biological vocabulary for codifying, managing and sharing biological knowledge. Quality issues in GO, if not addressed, can cause misleading results or missed biological discoveries. Manual identification of potential quality issues in GO is a challenging and arduous task, given its growing size. We introduce an automated auditing approach for suggesting potentially missing is-a relations, which may further reveal erroneous is-a relations.

Results: We developed a Subsumption-based Sub-term Inference Framework (SSIF) by leveraging a novel term-algebra on top of a sequence-based representation of GO concepts along with three conditional rules (monotonicity, intersection and sub-concept rules). Applying SSIF to the October 3, 2018 release of GO suggested 1938 unique potentially missing is-a relations. Domain experts evaluated a random sample of 210 potentially missing is-a relations. The results showed SSIF achieved a precision of 60.61, 60.49 and 46.03% for the monotonicity, intersection and sub-concept rules, respectively.

Availability and implementation: SSIF is implemented in Java. The source code is available at https://github.com/rashmie/SSIF.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
An example of two GO concepts satisfying the monotonicity rule and revealing a missing is-a relation: GO:0071450 is-a GO:0071241 (see the bolded, dashed arrow)
Fig. 2.
Fig. 2.
An example of two GO concepts satisfying the monotonicity rule and revealing an erroneous is-a relation: nucleotide catabolic process (GO:0009166) is-a biosynthetic process (GO:0009058) (see the bolded arrow with a cross)
Fig. 3.
Fig. 3.
An example of four GO concepts satisfying the intersection rule and revealing a missing is-a relation: negative regulation of ornithine catabolic process (GO:1903267) is a subtype of negative regulation of cellular amine catabolic process (GO:0033242) (see the bolded, dashed arrow)
Fig. 4.
Fig. 4.
An example of four GO concepts satisfying the intersection rule and revealing an erroneous existing relation: positive regulation of B cell deletion (GO:0002869) is-a regulation of acute inflammatory response (GO:0002673) (see the bolded arrow with a cross)

References

    1. Abeysinghe R. et al. (2017) Auditing subtype inconsistencies among gene ontology concepts. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, Missouri. pp. 1242–1245. IEEE.
    1. Alterovitz G. et al. (2006) Go PaD: the gene ontology partition database. Nucleic Acids Res., 35 (Suppl. 1), D322–D327. - PMC - PubMed
    1. Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Balhoff J.P. et al. (2018) Arachne: an OWL RL reasoner applied to gene ontology causal activity models (and beyond). In: International Semantic Web Conference (P&D/Industry/BlueSky), Monterey, California.
    1. Carbon S. et al.; the AmiGO Hub. (2009) AmiGO: online access to ontology and annotation data. Bioinformatics, 25, 288–289. - PMC - PubMed

Publication types