. 2014 Apr 12:15:105.

doi: 10.1186/1471-2105-15-105.

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

Rong Xu¹, Li Li, Quanqiu Wang

Affiliations

PMID: 24725842
PMCID: PMC3998061
DOI: 10.1186/1471-2105-15-105

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

Rong Xu et al. BMC Bioinformatics. 2014.

. 2014 Apr 12:15:105.

doi: 10.1186/1471-2105-15-105.

Authors

Rong Xu¹, Li Li, Quanqiu Wang

Affiliation

¹ Medical Informatics Division, Case Western Reserve University, Cleveland, OH, USA. rxx@case.edu.

PMID: 24725842
PMCID: PMC3998061
DOI: 10.1186/1471-2105-15-105

Abstract

Background: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repositioning. Currently, systematic study of disease relationships on a phenome-wide scale is limited due to the lack of large-scale machine understandable disease phenotype relationship knowledge bases. Our study innovates a semi-supervised iterative pattern learning approach that is used to build an precise, large-scale disease-disease risk relationship (D1 → D2) knowledge base (dRiskKB) from a vast corpus of free-text published biomedical literature.

Results: 21,354,075 MEDLINE records comprised the text corpus under study. First, we used one typical disease risk-specific syntactic pattern (i.e. "D1 due to D2") as a seed to automatically discover other patterns specifying similar semantic relationships among diseases. We then extracted D1 → D2 risk pairs from MEDLINE using the learned patterns. We manually evaluated the precisions of the learned patterns and extracted pairs. Finally, we analyzed the correlations between disease-disease risk pairs and their associated genes and drugs. The newly created dRiskKB consists of a total of 34,448 unique D1 → D2 pairs, representing the risk-specific semantic relationships among 12,981 diseases with each disease linked to its associated genes and drugs. The identified patterns are highly precise (average precision of 0.99) in specifying the risk-specific relationships among diseases. The precisions of extracted pairs are 0.919 for those that are exactly matched and 0.988 for those that are partially matched. By comparing the iterative pattern approach starting from different seeds, we demonstrated that our algorithm is robust in terms of seed choice. We show that diseases and their risk diseases as well as diseases with similar risk profiles tend to share both genes and drugs.

Conclusions: This unique dRiskKB, when combined with existing phenotypic, genetic, and genomic datasets, can have profound implications in our deeper understanding of disease etiology and in drug repositioning.

PubMed Disclaimer

Figures

**Figure 1**
The semi-supervised pattern-learning approach for extracting disease-disease risk pairs from MEDLINE.

**Figure 3**
Correlations between disease-disease pairs with shared risk or effect diseases and their associated genes (OMIM).

**Figure 4**
Correlations between disease-disease pairs with shared risk or effect diseases and their associated genes (GWAS).

**Figure 5**
Correlations between disease-disease pairs with shared risk or effect diseases and their associated drugs.

**Figure 6**
Weighted risk graph directly related to obesity.

**Figure 7**
Weighted risk graph directly related to type 2 diabetes (T2D).

See this image and copyright information in PMC

References

1. Bilder RM, Sabb FW, Cannon TD, London ED, Jentsch JD, Parker DS, Freimer NB. Phenomics: the systematic study of phenotypes on a genome-wide scale. Neuroscience. 2009;164(1):30–42. doi: 10.1016/j.neuroscience.2009.01.027. - DOI - PMC - PubMed
1. Freimer N, Sabatti C. The human phenome project. Nat Genet. 2003;34(1):15–21. doi: 10.1038/ng0503-15. - DOI - PubMed
1. Houle D, Govindaraju DR, Omholt S. Phenomics: the next challenge. Nat Rev Genet. 2010;11(12):855–866. doi: 10.1038/nrg2897. - DOI - PubMed
1. Xu R, Li L, Wang Q. Towards building a disease-phenotype relationship knowledge base: large-scale extraction of disease-manifestation relationship from literature. Bioinformatics. 2003. doi: 10.1093/bioinformatics/btt359. - PMC - PubMed
1. Lee DS, Park J, Kay KA, Christakis NA, Oltvai ZN, Barabasi AL. The implications of human metabolic network topology for disease comorbidity. Proc Nat Acad Sci. 2008;105(29):8. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

Affiliation

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources