Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 23:7:8.
doi: 10.1186/s13326-016-0051-7. eCollection 2016.

Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation

Affiliations

Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation

Sirarat Sarntivijai et al. J Biomed Semantics. .

Abstract

Background: The Centre for Therapeutic Target Validation (CTTV - https://www.targetvalidation.org/) was established to generate therapeutic target evidence from genome-scale experiments and analyses. CTTV aims to support the validity of therapeutic targets by integrating existing and newly-generated data. Data integration has been achieved in some resources by mapping metadata such as disease and phenotypes to the Experimental Factor Ontology (EFO). Additionally, the relationship between ontology descriptions of rare and common diseases and their phenotypes can offer insights into shared biological mechanisms and potential drug targets. Ontologies are not ideal for representing the sometimes associated type relationship required. This work addresses two challenges; annotation of diverse big data, and representation of complex, sometimes associated relationships between concepts.

Methods: Semantic mapping uses a combination of custom scripting, our annotation tool 'Zooma', and expert curation. Disease-phenotype associations were generated using literature mining on Europe PubMed Central abstracts, which were manually verified by experts for validity. Representation of the disease-phenotype association was achieved by the Ontology of Biomedical AssociatioN (OBAN), a generic association representation model. OBAN represents associations between a subject and object i.e., disease and its associated phenotypes and the source of evidence for that association. The indirect disease-to-disease associations are exposed through shared phenotypes. This was applied to the use case of linking rare to common diseases at the CTTV.

Results: EFO yields an average of over 80% of mapping coverage in all data sources. A 42% precision is obtained from the manual verification of the text-mined disease-phenotype associations. This results in 1452 and 2810 disease-phenotype pairs for IBD and autoimmune disease and contributes towards 11,338 rare diseases associations (merged with existing published work [Am J Hum Genet 97:111-24, 2015]). An OBAN result file is downloadable at http://sourceforge.net/p/efo/code/HEAD/tree/trunk/src/efoassociations/. Twenty common diseases are linked to 85 rare diseases by shared phenotypes. A generalizable OBAN model for association representation is presented in this study.

Conclusions: Here we present solutions to large-scale annotation-ontology mapping in the CTTV knowledge base, a process for disease-phenotype mining, and propose a generic association model, 'OBAN', as a means to integrate disease using shared phenotypes.

Availability: EFO is released monthly and available for download at http://www.ebi.ac.uk/efo/.

Keywords: CTTV; EFO; OBAN; Phenotype disease associations; Rare disease.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
There were 2214 EFO-native classes in January 2010, and 3992 EFO-native classes in January 2015. Although EFO has significantly grown in its number of native classes, the number of imported classes has grown at a much higher rate. Importing more than 6000 rare disease classes from ORDO in 2012, and axiomatizing them into EFO has resulted in a sudden increase between 2012 and 2013. This reflects the use of EFO as an application ontology providing interoperability across domain ontologies through semantic axiomatization
Fig. 2
Fig. 2
The cell line design pattern in EFO links an EFO class ‘cell line’ to external ontologies via import mechanism. An EFO cell line derives_from a cell type class from Cell Ontology, which is part_of an organism – a class imported from NCBI Taxon. EFO cell line class is also a bearer_of a disease – a class imported from ORDO or class native to EFO itself
Fig. 3
Fig. 3
An OBAN association links an entity such as a disease to another such as an associated phenotype and retains the provenance information (e.g., manual curation, published findings, etc). Entities marked with * are required and others are added on per association basis, for instance the PubMed triple in this figure
Fig. 4
Fig. 4
An example of connecting a phenotype (malabsorption) with a disease (ileocolitis) using OBAN. Provenance here is manual curation by a named surgeon (name omitted here)
Fig. 5
Fig. 5
A summary of the rare-to-common associations linking diseases via anatomical system through the has_disease_location axiomatization inside EFO. The high-resolution image is downloadable at https://github.com/CTTV/ISMB2015/blob/master/figures/r2c.pdf blob/master/figures/r2c.pdf and provided in supplementary materials
Fig. 6
Fig. 6
Summary of the number of associations and provenances in each group of diseases in CTTV as of 28th September 2015

References

    1. McKusick-Nathans Institute of Genetic Medicine JHU. Online Mendelian Inheritance in Man, OMIM. Baltimore, MD. 1998. http://www.omim.org/. 2015.
    1. INSERM-Orphanet. Orphanet: an online database of rare diseases and orphan drugs. Paris, France. 1997. http://www.orpha.net/. 2015.
    1. Ma'ayan A, Rouillard AD, Clark NR, Wang Z, Duan Q, Kou Y. Lean Big Data integration in systems biology and systems pharmacology. Trends Pharmacol Sci. 2014;35(9):450–60. doi: 10.1016/j.tips.2014.07.001. - DOI - PMC - PubMed
    1. Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, et al. Modeling sample variables with an Experimental Factor Ontology. Bioinformatics. 2010;26(8):1112–8. doi: 10.1093/bioinformatics/btq099. - DOI - PMC - PubMed
    1. Vasant D, Chanas L, Malone J, Hanauer M, Olry A, Jupp S, et al. ORDO: An Ontology Connecting Rare Disease, Epidemiology and Genetic Data. 2014.

Publication types