Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(7):e22670.
doi: 10.1371/journal.pone.0022670. Epub 2011 Jul 29.

Exploring and exploiting disease interactions from multi-relational gene and phenotype networks

Affiliations

Exploring and exploiting disease interactions from multi-relational gene and phenotype networks

Darcy A Davis et al. PLoS One. 2011.

Abstract

The availability of electronic health care records is unlocking the potential for novel studies on understanding and modeling disease co-morbidities based on both phenotypic and genetic data. Moreover, the insurgence of increasingly reliable phenotypic data can aid further studies on investigating the potential genetic links among diseases. The goal is to create a feedback loop where computational tools guide and facilitate research, leading to improved biological knowledge and clinical standards, which in turn should generate better data. We build and analyze disease interaction networks based on data collected from previous genetic association studies and patient medical histories, spanning over 12 years, acquired from a regional hospital. By exploring both individual and combined interactions among these two levels of disease data, we provide novel insight into the interplay between genetics and clinical realities. Our results show a marked difference between the well defined structure of genetic relationships and the chaotic co-morbidity network, but also highlight clear interdependencies. We demonstrate the power of these dependencies by proposing a novel multi-relational link prediction method, showing that disease co-morbidity can enhance our currently limited knowledge of genetic association. Furthermore, our methods for integrated networks of diverse data are widely applicable and can provide novel advances for many problems in systems biology and personalized medicine.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Global network properties.
(A) Degree distributions and (B) clustering spectrums of the phenotypic (PDN) and genetic (GDN) disease networks. The PDN has higher average degree and clustering coefficient due to very high edge density. Interestingly, the degree distribution of the GDN generally decreasing while the PDN is more uniform, indicating that many diseases are co-morbid with a large number of other diseases, often with few or no underlying shared genes.
Figure 2
Figure 2. The Phenotypic and Genetic Disease Networks.
(A) The phenotypic disease network (PDN) is constructed based on clinical history of 700,00 patients. Each node represents a unique disease, and two nodes are connected if the diseases co-morbid significantly more than randomly expected according to population prevalence. (B) The genetic disease network (GDN) is constructed on the same disease nodes, but edges instead indicate that the disease pair shares a significant number of gene associations. In both networks, black edges indicate hierarchically related diseases (is-a relationships). For each network, the accompanying table displays the most relevant Disease Ontology codes associated with each cluster. Purity corresponds to the percent of member nodes which are accurately described by the DO term, and completeness indicates the percentage of descendants of the DO term which belong to the cluster. For a detailed definition, see Materials and Methods. It is clear that the PDN and GDN are structurally different. Nonetheless, both networks form some easily defined clusters but also have some dense regions containing diverse DO terms.
Figure 3
Figure 3. The Multi-Relational Disease Network.
This network is created by overlaying the phenotypic (PDN) and genetic (GDN) networks, which contain the same disease nodes. Blue edges indicate phenotypic links, red edges are genetic, green edges are both genetic and phenotypic, and black edge are is-a relationships. The two-tone nodes indicate original cluster membership in the GDN (inner circle) and PDN (outer circle). Regions where multiple nodes share the same color pattern correspond to groups of diseases which cluster together in both the PDN and the GDN. These overlaps are common and in some cases quite large, such as the teal-and-green cluster containing the heart diseases. Still, none of the overlaps fully contain a PDN or GDN cluster. The overlapping regions are listed in the accompanying table, along with the most relevant Disease Ontology codes associated with the cluster.
Figure 4
Figure 4. Genetic vs. phenotypic mutual information.
Each data point represents a disease pair which is linked in both the PDN and the GDN. The plot illustrates the correlation between the mutual information edges weights in each respective network. There is some upward trend but the effect is far from linear. In aggregate, the values have a Pearson correlation of .473, a weak-to-moderate positive correlation.
Figure 5
Figure 5. Finding edge probabilities given partial structures.
This toy example demonstrates how to calculate the probability of a specific edge type closing an open triad pattern, based on the triad counts for the full network. This calculation corresponds the Equation 3. The numbers in this example do not represent the real network. The table of actual edge probabilities for the MRDN can be found in Table S1.
Figure 6
Figure 6. Link prediction performance.
(A) Receiver operating curves (ROC) and (B) precision-recall curves for the multi-relational link predictor (MRLP) and three traditional neighborhood-based link prediction methods: common neighbors, Jaccard coefficient, and the Adamic/Adar measure. MRLP is the best method with respect to area under the receiver operating curve (AUROC). The precision-recall curve, which is less biased, shows that MRLP is most accurate with the highest ranked predictions, but is not always optimal for lower prediction thresholds.
Figure 7
Figure 7. Link predictor performance by individual disease.
Area under the receiver operating curve (AUROC) comparison of link predictor performance for each unique disease. The experiments were hold-one-out, where all genetic associations of the testing disease were removed. The x axis shows the performance of Adamic/Adar on the phenotypic data only, and the y axis is the performance using the MRLP on the multi-relational network. Each point which falls above the diagonal indicates that multi-relational evidence improved link prediction performance for the corresponding disease.

Similar articles

Cited by

References

    1. Baudot A, Gómez-López G, Valencia A. Translational disease interpretation with molecular networks. Genome Biology. 2009;10:221. - PMC - PubMed
    1. Emilsson V, Thorleifsson G, Zhang B, Leonardson A, Zink F, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452:423–428. - PubMed
    1. Schadt E. Molecular networks as sensors and drivers of common human diseases. Nature. 2009;461:218–223. - PubMed
    1. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. Proceedings of the National Academy of Sciences; 2007. The human disease network. pp. 8685–8690. - PMC - PubMed
    1. Hidalgo C, Blumm N, Barabási A, Christakis N, Meyers L. A dynamic network approach for the study of human phenotypes. PLoS Comput Biol. 2009;5:e1000353. - PMC - PubMed

Publication types