Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 27;8(1):25.
doi: 10.1186/s13326-017-0134-0.

Towards a more molecular taxonomy of disease

Affiliations

Towards a more molecular taxonomy of disease

Jisoo Park et al. J Biomed Semantics. .

Abstract

Background: Disease taxonomies have been designed for many applications, but they tend not to fully incorporate the growing amount of molecular-level knowledge of disease processes, inhibiting research efforts. Understanding the degree to which we can infer disease relationships from molecular data alone may yield insights into how to ultimately construct more modern taxonomies that integrate both physiological and molecular information.

Results: We introduce a new technique we call Parent Promotion to infer hierarchical relationships between disease terms using disease-gene data. We compare this technique with both an established ontology inference method (CliXO) and a minimum weight spanning tree approach. Because there is no gold standard molecular disease taxonomy available, we compare our inferred hierarchies to both the Medical Subject Headings (MeSH) category C forest of diseases and to subnetworks of the Disease Ontology (DO). This comparison provides insights about the inference algorithms, choices of evaluation metrics, and the existing molecular content of various subnetworks of MeSH and the DO. Our results suggest that the Parent Promotion method performs well in most cases. Performance across MeSH trees is also correlated between inference methods. Specifically, inferred relationships are more consistent with those in smaller MeSH disease trees than larger ones, but there are some notable exceptions that may correlate with higher molecular content in MeSH.

Conclusions: Our experiments provide insights about learning relationships between diseases from disease genes alone. Future work should explore the prospect of disease term discovery from molecular data and how best to integrate molecular data with anatomical and clinical knowledge. This study nonetheless suggests that disease gene information has the potential to form an important part of the foundation for future representations of the disease landscape.

Keywords: Disease Ontology; Disease Ontology inference; Disease gene association; Disease tree inference; Hierarchical clustering; Medical Subject Headings tree; Pairwise disease similarity; Parent Promotion.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors read and approved the final version of the manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Topological difference between MeSH and the corresponding inferred ontology using CliXO. a A MeSH subtree containing prematurity complications. b Corresponding Disease Ontology inferred using CliXO and ontology alignment. Drawn in Cytoscape v. 3.3.0 [30]
Fig. 2
Fig. 2
How the Parent Promotion method transforms a dendrogram created by hierarchical clustering. a Dendrogram for diseases of infants born preterm. Hierarchical clustering builds a tree whose internal nodes are hard to interpret. b Parent Promotion finds the most general disease term from each cluster and promotes it as an internal node. An internal node becomes the parent of all other nodes in the same cluster. Disease term 3 has the most citations and keeps being selected for promotion until it becomes the root. Disease term 6 has more citations than 5 and is promoted as the parent of 5. However, it later becomes a child of 3 because it has fewer citations than 3. c Final tree built by Parent Promotion
Fig. 3
Fig. 3
Parent Promotion tree using DO data. Subtree of the disease tree built by Parent Promotion on DO “musculoskeletal system disease” data that is an exact match to nodes and edges in the DO
Fig. 4
Fig. 4
A MeSH tree rooted at “Respiration Disorder” and corresponding inferred disease trees. a The MeSH tree containing “Respiration Disorder” and its descendants. b The disease tree inferred by Parent Promotion on data from the tree in a). c The disease tree inferred by MWST from the same data. MWST builds a taller and slimmer tree. As a result, most diseases have more ancestors in c) than in a) or b). This leads MWST to have good performance with respect to Ancestor Recall (AR)

References

    1. Park J, Wick HC, Kee DE, Noto K, Maron JL, Slonim DK. Finding novel molecular connections between developmental processes and disease. PLoS Comput Biol. 2014;10(5):1003578. doi: 10.1371/journal.pcbi.1003578. - DOI - PMC - PubMed
    1. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26(9):1205–10. doi: 10.1093/bioinformatics/btq126. - DOI - PMC - PubMed
    1. Kohler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008;82(4):949–58. doi: 10.1016/j.ajhg.2008.02.013. - DOI - PMC - PubMed
    1. Desmond-Hellmann S, Sawyers CL, et al. Toward precision medicine: Building a knowledge network for biomedical research and a new taxonomy of disease. Technical report, National Research Council. 2011. - PubMed
    1. Kramer M, Dutkowski J, Yu M, Bafna V, Ideker T. Inferring gene ontologies from pairwise similarity data. Bioinformatics. 2014;30:34–42. doi: 10.1093/bioinformatics/btu282. - DOI - PMC - PubMed

LinkOut - more resources