Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 12:15:105.
doi: 10.1186/1471-2105-15-105.

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

Affiliations

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

Rong Xu et al. BMC Bioinformatics. .

Abstract

Background: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repositioning. Currently, systematic study of disease relationships on a phenome-wide scale is limited due to the lack of large-scale machine understandable disease phenotype relationship knowledge bases. Our study innovates a semi-supervised iterative pattern learning approach that is used to build an precise, large-scale disease-disease risk relationship (D1 → D2) knowledge base (dRiskKB) from a vast corpus of free-text published biomedical literature.

Results: 21,354,075 MEDLINE records comprised the text corpus under study. First, we used one typical disease risk-specific syntactic pattern (i.e. "D1 due to D2") as a seed to automatically discover other patterns specifying similar semantic relationships among diseases. We then extracted D1 → D2 risk pairs from MEDLINE using the learned patterns. We manually evaluated the precisions of the learned patterns and extracted pairs. Finally, we analyzed the correlations between disease-disease risk pairs and their associated genes and drugs. The newly created dRiskKB consists of a total of 34,448 unique D1 → D2 pairs, representing the risk-specific semantic relationships among 12,981 diseases with each disease linked to its associated genes and drugs. The identified patterns are highly precise (average precision of 0.99) in specifying the risk-specific relationships among diseases. The precisions of extracted pairs are 0.919 for those that are exactly matched and 0.988 for those that are partially matched. By comparing the iterative pattern approach starting from different seeds, we demonstrated that our algorithm is robust in terms of seed choice. We show that diseases and their risk diseases as well as diseases with similar risk profiles tend to share both genes and drugs.

Conclusions: This unique dRiskKB, when combined with existing phenotypic, genetic, and genomic datasets, can have profound implications in our deeper understanding of disease etiology and in drug repositioning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The semi-supervised pattern-learning approach for extracting disease-disease risk pairs from MEDLINE.
Figure 2
Figure 2
Pattern precisions.
Figure 3
Figure 3
Correlations between disease-disease pairs with shared risk or effect diseases and their associated genes (OMIM).
Figure 4
Figure 4
Correlations between disease-disease pairs with shared risk or effect diseases and their associated genes (GWAS).
Figure 5
Figure 5
Correlations between disease-disease pairs with shared risk or effect diseases and their associated drugs.
Figure 6
Figure 6
Weighted risk graph directly related to obesity.
Figure 7
Figure 7
Weighted risk graph directly related to type 2 diabetes (T2D).

References

    1. Bilder RM, Sabb FW, Cannon TD, London ED, Jentsch JD, Parker DS, Freimer NB. Phenomics: the systematic study of phenotypes on a genome-wide scale. Neuroscience. 2009;164(1):30–42. doi: 10.1016/j.neuroscience.2009.01.027. - DOI - PMC - PubMed
    1. Freimer N, Sabatti C. The human phenome project. Nat Genet. 2003;34(1):15–21. doi: 10.1038/ng0503-15. - DOI - PubMed
    1. Houle D, Govindaraju DR, Omholt S. Phenomics: the next challenge. Nat Rev Genet. 2010;11(12):855–866. doi: 10.1038/nrg2897. - DOI - PubMed
    1. Xu R, Li L, Wang Q. Towards building a disease-phenotype relationship knowledge base: large-scale extraction of disease-manifestation relationship from literature. Bioinformatics. 2003. doi: 10.1093/bioinformatics/btt359. - PMC - PubMed
    1. Lee DS, Park J, Kay KA, Christakis NA, Oltvai ZN, Barabasi AL. The implications of human metabolic network topology for disease comorbidity. Proc Nat Acad Sci. 2008;105(29):8. - PMC - PubMed

Publication types

LinkOut - more resources