Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;8(6):853-871.
doi: 10.3233/sw-160238.

A Systematic Analysis of Term Reuse and Term Overlap across Biomedical Ontologies

Affiliations

A Systematic Analysis of Term Reuse and Term Overlap across Biomedical Ontologies

Maulik R Kamdar et al. Semant Web. 2017.

Abstract

Reusing ontologies and their terms is a principle and best practice that most ontology development methodologies strongly encourage. Reuse comes with the promise to support the semantic interoperability and to reduce engineering costs. In this paper, we present a descriptive study of the current extent of term reuse and overlap among biomedical ontologies. We use the corpus of biomedical ontologies stored in the BioPortal repository, and analyze different types of reuse and overlap constructs. While we find an approximate term overlap between 25-31%, the term reuse is only <9%, with most ontologies reusing fewer than 5% of their terms from a small set of popular ontologies. Clustering analysis shows that the terms reused by a common set of ontologies have >90% semantic similarity, hinting that ontology developers tend to reuse terms that are sibling or parent-child nodes. We validate this finding by analysing the logs generated from a Protégé plugin that enables developers to reuse terms from BioPortal. We find most reuse constructs were 2-level subtrees on the higher levels of the class hierarchy. We developed a Web application that visualizes reuse dependencies and overlap among ontologies, and that proposes similar terms from BioPortal for a term of interest. We also identified a set of error patterns that indicate that ontology developers did intend to reuse terms from other ontologies, but that they were using different and sometimes incorrect representations. Our results stipulate the need for semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.

Keywords: Biomedical Domain; Composite Mappings; Descriptive Study; Ontologies; Term Overlap; Term Reuse; Visualization.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Types of Reuse: a) CUI reuse: Diabetes Mellitus terms in SNOMED CT and ICD-9CM are mapped to the same CUI, b) IRI reuse: RNA Binding defined in the GO ontology is reused in GEXO ontology using the same IRI; xref reuse: the latter term is reused in the GRO Ontology via a xref annotation
Fig. 2
Fig. 2
Worflow of all the steps required to estimate the average term reuse and overlap statistics across the BioPortal Ontologies, as well as clustering and BioPortal Import Plugin Log analysis to detect any reuse patterns. The steps of the workflow are: (1) Ontology Pre-processing, (2) Term Reuse, (3) Term Overlap, (4) Clustering, and (5) Log Analysis.
Fig. 3
Fig. 3
Cartoon representations of the a) Reuse, b) Overlap : OG and c) Overlap – Reuse : G{Reuse} modules. In a) Terms A and E are defined in two ontologies using same IRI. The green, dotted arrow in Reuse module is a xref mapping from EA, whereas the green, bidirectional arrow means the terms G and H are mapped to same CUI. In b) and c) the two disjoint components T1 and T2 are composed of {A,B,C,D,E} and {F,G,H} terms respectively. The darkened path CAD represents a sample composite mapping, formed by different edge types.
Fig. 4
Fig. 4
Histogram depicting the number of ontologies that reuse a given percentage (%) of terms from other ontologies in their current versions by the same IRI or xref annotation. Most ontologies reuse fewer than 5% of their terms.
Fig. 5
Fig. 5
Top 16 ontologies whose terms are reused the most through IRI and xref constructs. Number of ontologies reusing (#) and percentage (%) of terms reused with respect to the terms in their current version.
Fig. 6
Fig. 6
30% term overlap among different BioPortal ontologies. For simplicity, only the OBO Foundry member and candidate ontologies (blue squares), UMLS terminologies (red circles), and a few popular ontologies in BioPortal (green octagons) are shown here.
Fig. 7
Fig. 7
Proportion of term pairs with semantic similarity in a given range for each sub-cluster.
Fig. 8
Fig. 8. BioPortal Import Plugin Log Analysis
Few ontologies that are reused the most through the BioPortal Import Plugin are shown — FMA, ICD10PCS, NCIT and SNOMED CT. The lower plot indicates the total number of sessions observed, the total number of single terms imported, the total number of structures imported, and the total number of terms imported in log scale. The upper plot indicates the content imported from each ontology spanning across its depth. Each structure imported is represented as a translucent polygon, whereas the single terms are grouped as circular shapes for each level.

References

    1. Bodenreider O. Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearbook of medical informatics. 2008:67. - PMC - PubMed
    1. Rubin DL, et al. Biomedical ontologies: a functional perspective. Briefings in bioinformatics. 2008;9(1):75–90. doi: 10.1093/bib/bbm059. - DOI - PubMed
    1. Sioutos N, et al. NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. Journal of biomedical informatics. 2007;40(1):30–43. doi: 10.1016/j.jbi.2006.02.013. - DOI - PubMed
    1. Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nature genetics. 2000;25(1):25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Stearns MQ, et al. Proceedings of the AMIA Symposium. American Medical Informatics Association; 2001. SNOMED clinical terms: overview of the development process and project status; p. 662. - PMC - PubMed

LinkOut - more resources