. 2010 Jan 15;6(1):e1000641.

doi: 10.1371/journal.pcbi.1000641.

Associating genes and protein complexes with disease via network propagation

Oron Vanunu¹, Oded Magger, Eytan Ruppin, Tomer Shlomi, Roded Sharan

Affiliations

PMID: 20090828
PMCID: PMC2797085
DOI: 10.1371/journal.pcbi.1000641

Associating genes and protein complexes with disease via network propagation

Oron Vanunu et al. PLoS Comput Biol. 2010.

. 2010 Jan 15;6(1):e1000641.

doi: 10.1371/journal.pcbi.1000641.

Authors

Oron Vanunu¹, Oded Magger, Eytan Ruppin, Tomer Shlomi, Roded Sharan

Affiliation

¹ School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.

PMID: 20090828
PMCID: PMC2797085
DOI: 10.1371/journal.pcbi.1000641

Abstract

A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Illustration of the PRINCE algorithm.**
A query disease, denoted Q, has varying degrees of phenotypic similarity with other diseases, denoted d1–d5 (marked with maroon lines, where thicker lines represent higher similarity). Known causal genes for these similar diseases are connected by dashed blue lines and used as the prior information. p1–*p11* comprise the protein set of a protein-protein interaction network, where interactions are marked with black lines and thicker lines denote edges with higher confidence. A scoring function that is smooth over the network is computed using an iterative network propagation method. At every iteration of the algorithm, each protein pumps flow to its neighbors and receives flow from them. Protein colors correspond to the flow they receive in a specific iteration, the darker the color the higher the flow. (A): the flow after the first iteration, representing the prior information. Only proteins p2, p4 & p9, which are directly associated with similar diseases, have a positive incoming flow. (B): After several iterations, the amount of flow to each node converges, and the resulting flow, used to score the proteins, appears to be smooth over the network. p5 emerges as the best causal gene candidate for disease Q, as it interacts with both p2 and p4.

**Figure 2. A comparison of prioritization algorithms.**
Performance comparison for PRINCE, Random Walk and CIPHER in a leave-one-out cross-validation test over 1,369 diseases with a known causal gene. The figure shows recall versus precision when considering the top proteins for various values of .

formula image — **Figure 2. A comparison of prioritization algorithms.**
Performance comparison for PRINCE, Random Walk and CIPHER in a leave-one-out cross-validation test over 1,369 diseases with a known causal gene. The figure shows recall versus precision when considering the top proteins for various values of .

**Figure 3. Case studies of inferred complexes.**
Examples of inferred protein complexes and their associated diseases. Circular nodes represent proteins and their connecting edges represent protein-protein interactions. Diseases are denoted by square nodes, connected by phenotypic similarity edges. Green dashed edges represent known gene-disease associations; red edges connect a disease to a gene that lies within its associated genomic interval. The complexes were generated for the query diseases (A) Ataxia-Telangiectasia, (B) Hereditary Prostate Cancer type 8 and (C) MOPD-I.

See this image and copyright information in PMC

References

1. George RA, Liu JY, Feng LL, Bryson RJ, Fetkin D, et al. Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res. 2006;34:e130. - PMC - PubMed
1. Perez-Iratxeta C, Bork P, Andrade-Navarro MA. Update of the g2d tool for prioritization of gene candidates to inherited diseases. Nucleic Acids Res. 2007;35:W212–6. - PMC - PubMed
1. Oti M, Snel B, Huynen MA, Brunner HG. Predicting disease genes using protein-protein interactions. J Med Genet. 2006;43:691–698. - PMC - PubMed
1. Franke L, Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, et al. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet. 2006;78:1011–1025. - PMC - PubMed
1. Oti M, Brunner HG. The modular nature of genetic diseases. Clinical Genetics. 2007;71:1–11. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Associating genes and protein complexes with disease via network propagation

Affiliation

Associating genes and protein complexes with disease via network propagation

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous