Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 31:6:32404.
doi: 10.1038/srep32404.

Large-Scale Discovery of Disease-Disease and Disease-Gene Associations

Affiliations

Large-Scale Discovery of Disease-Disease and Disease-Gene Associations

Djordje Gligorijevic et al. Sci Rep. .

Abstract

Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Graphical summary of the approach proposed in this study.
Heterogeneous data obtained from large scale discharge records and hand curated disease-gene associations are used to jointly learn meaningful vector representations of disease and gene concepts in a latent vector space, where interactions of diseases and genes are retrieved and discovered.
Figure 2
Figure 2. Graphical representations of the D2D and DAG2D models illustrated on projecting Acute Myocardial Infarction (AMI) diagnoses and AMI-related genes to AMI-associated diagnoses.
Figure 3
Figure 3. Precision@K for D2D model with different dimension D of the embedding space.

References

    1. Kohane I. S. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12, 417–428 (2011). - PubMed
    1. Madsen L. B. Data-Driven Healthcare: How Analytics and BI are Transforming the Industry (Wiley, 2014).
    1. Hripcsak G. & Albers D. J. Next-generation phenotyping of electronic health records. J. Am. Med. Inform. Assoc. 20, 117–121 (2013). - PMC - PubMed
    1. Chowriappa P., Dua S. & Todorov Y. Introduction to machine learning in healthcare informatics. Machine Learning in Healthcare Informatics 1–23 (Springer, 2014).
    1. Menche J. et al.. Uncovering disease-disease relationships through the incomplete interactome. Science 347 (2015). - PMC - PubMed

Publication types