. 2014 Oct 31;9(10):e110936.

doi: 10.1371/journal.pone.0110936. eCollection 2014.

Relating diseases by integrating gene associations and information flow through protein interaction network

Mehdi Bagheri Hamaneh¹, Yi-Kuo Yu¹

Affiliations

PMID: 25360770
PMCID: PMC4216010
DOI: 10.1371/journal.pone.0110936

Relating diseases by integrating gene associations and information flow through protein interaction network

Mehdi Bagheri Hamaneh et al. PLoS One. 2014.

. 2014 Oct 31;9(10):e110936.

doi: 10.1371/journal.pone.0110936. eCollection 2014.

Authors

Mehdi Bagheri Hamaneh¹, Yi-Kuo Yu¹

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

PMID: 25360770
PMCID: PMC4216010
DOI: 10.1371/journal.pone.0110936

Abstract

Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. The relation between the results of enrichment analysis and the average correlation .**
The percentage of diseases for which GO/KEGG terms were identified by Saddlesum as a function of average correlation . To facilitate the calculation, we sorted all s in ascending order and placed them into bins each containing diseases. The percentage is then measured by the number of diseases with GO/KEGG term hit(s) per bin. For very low average correlations is significantly lower.

formula image — **Figure 1. The relation between the results of enrichment analysis and the average correlation .**
The percentage of diseases for which GO/KEGG terms were identified by Saddlesum as a function of average correlation . To facilitate the calculation, we sorted all s in ascending order and placed them into bins each containing diseases. The percentage is then measured by the number of diseases with GO/KEGG term hit(s) per bin. For very low average correlations is significantly lower.

**Figure 2. The probabilities of having common term associations or being siblings.**
(A) The probabilities of finding a pair of diseases with (1) common GO/KEGG terms (red), (2) the same parents and common associations (blue), and (3) the same parents without shared biological terms (green) are shown. Here only pairs with a defined term similarity are considered. (B) For pairs with undefined (pairs with at least one member not associated with any biological terms), the distribution of siblings is plotted as a function of correlation. (C) and (D) show similar quantities to (A) and (B) respectively, when the biological term associations are directly retrieved from the KEGG DISEASE database.

**Figure 3. Comparison with MimMiner.**
(A) The inset figure shows the number () of weighted disease pairs with shared KEGG pathways that were ranked higher than by MimMiner (in red) and or by our method (in blue). Also shown in the inset (in green) is the weighted number of pairs with common term associations missed (ranked lower) by MimMiner, but identified (ranked higher) by our model. In the main panel, the same quantities corresponding to the proposed method are plotted after exclusion of obvious candidates for being related. The closeness between the blue and green curves indicates that the non-apparent candidates found by our method are largely missed by MimMiner. Displayed in panel (B) is the inverse of average normalized rank versus the term similarity cutoff. At large similarity cutoff, the higher the average normalized rank (the smaller and thus the larger ) the better the agreement between the quality scores (cosine similarity or the MimMiner score) and the KEGG annotation.

**Figure 4. The effect of clustering on the minimum term size.**
The minimum term size distribution of (A) GO and (B) KEGG terms reported by SaddleSum enrichment analyses when using disease weight vectors directly (red curves) and when using cluster center vectors (blue curves). Not only the most informative (smallest size) terms are preserved during clustering, the clustering procedure seems to shift the minimum term size distribution towards the small end, indicating the likelihood of providing even more specific terms when weight vectors are grouped under the proposed clustering procedure.

**Figure 5. Two example clusters.**
The clusters that include Parkinson's disease (OMIM:168600) and Retinitis pigmentosa 7 (MESH:C564284) are shown in panels (A) and (B) respectively. In each case, only diseases with membership probabilities larger than 5% are shown. The size of each node (circle) is proportional to the probability of membership of that node in the cluster. For a disease pair, the thickness of the line linking the diseases is proportional to , where is the correlation between the two diseases and is the minimum correlation between all diseases shown in each cluster. The names and IDs of the members of each cluster are also given. Diseases whose names are written in the same color (other than black) have exactly the same gene associations and so are equivalent in our study. Equivalent diseases are represented by one node in the figure. For example, the node identified by C566637 in panel (B) represents the four diseases whose names are in green, i.e. C535804, C566637, C565827, and C562479.

See this image and copyright information in PMC

Cited by

InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.
Cheng L, Jiang Y, Ju H, Sun J, Peng J, Zhou M, Hu Y. Cheng L, et al. BMC Genomics. 2018 Jan 19;19(Suppl 1):919. doi: 10.1186/s12864-017-4338-6. BMC Genomics. 2018. PMID: 29363423 Free PMC article.
Biomechanisms of Comorbidity: Reviewing Integrative Analyses of Multi-omics Datasets and Electronic Health Records.
Pouladi N, Achour I, Li H, Berghout J, Kenost C, Gonzalez-Garay ML, Lussier YA. Pouladi N, et al. Yearb Med Inform. 2016 Nov 10;(1):194-206. doi: 10.15265/IY-2016-040. Yearb Med Inform. 2016. PMID: 27830251 Free PMC article. Review.
DeCoaD: determining correlations among diseases using protein interaction networks.
Hamaneh MB, Yu YK. Hamaneh MB, et al. BMC Res Notes. 2015 Jun 6;8:226. doi: 10.1186/s13104-015-1211-z. BMC Res Notes. 2015. PMID: 26047952 Free PMC article.
IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity.
Cheng L, Shi H, Wang Z, Hu Y, Yang H, Zhou C, Sun J, Zhou M. Cheng L, et al. Oncotarget. 2016 Jul 26;7(30):47864-47874. doi: 10.18632/oncotarget.10012. Oncotarget. 2016. PMID: 27323856 Free PMC article.
Mechanism-based disease similarity.
Hamaneh MB, Yu YK. Hamaneh MB, et al. J Rare Dis Res Treat. 2016;1(3):1-4. doi: 10.29245/2572-9411/2016/3.1044. Epub 2016 Oct 18. J Rare Dis Res Treat. 2016. PMID: 30854526 Free PMC article.

See all "Cited by" articles

References

1. Coletti MH, Bleich HL (2001) Medical subject headings used to search the biomedical literature. J Am Med Inform Assoc 8: 317–323. - PMC - PubMed
1. Schriml LM, Arze C, Nadendla S, Chang YW, Mazaitis M, et al. (2012) Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res 40: D940–946. - PMC - PubMed
1. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci USA 104: 8685–8690. - PMC - PubMed
1. Lee DS, Park J, Kay KA, Christakis NA, Oltvai ZN, et al. (2008) The implications of human metabolic network topology for disease comorbidity. Proc Natl Acad Sci USA 105: 9880–9885. - PMC - PubMed
1. Zhang X, Zhang R, Jiang Y, Sun P, Tang G, et al. (2011) The expanded human disease network combining protein-protein interaction information. Eur J Hum Genet 19: 783–788. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Relating diseases by integrating gene associations and information flow through protein interaction network

Affiliation

Relating diseases by integrating gene associations and information flow through protein interaction network

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources