Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 15;6(1):e1000641.
doi: 10.1371/journal.pcbi.1000641.

Associating genes and protein complexes with disease via network propagation

Affiliations

Associating genes and protein complexes with disease via network propagation

Oron Vanunu et al. PLoS Comput Biol. .

Abstract

A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Illustration of the PRINCE algorithm.
A query disease, denoted Q, has varying degrees of phenotypic similarity with other diseases, denoted d1d5 (marked with maroon lines, where thicker lines represent higher similarity). Known causal genes for these similar diseases are connected by dashed blue lines and used as the prior information. p1p11 comprise the protein set of a protein-protein interaction network, where interactions are marked with black lines and thicker lines denote edges with higher confidence. A scoring function that is smooth over the network is computed using an iterative network propagation method. At every iteration of the algorithm, each protein pumps flow to its neighbors and receives flow from them. Protein colors correspond to the flow they receive in a specific iteration, the darker the color the higher the flow. (A): the flow after the first iteration, representing the prior information. Only proteins p2, p4 & p9, which are directly associated with similar diseases, have a positive incoming flow. (B): After several iterations, the amount of flow to each node converges, and the resulting flow, used to score the proteins, appears to be smooth over the network. p5 emerges as the best causal gene candidate for disease Q, as it interacts with both p2 and p4.
Figure 2
Figure 2. A comparison of prioritization algorithms.
Performance comparison for PRINCE, Random Walk and CIPHER in a leave-one-out cross-validation test over 1,369 diseases with a known causal gene. The figure shows recall versus precision when considering the top formula image proteins for various values of formula image.
Figure 3
Figure 3. Case studies of inferred complexes.
Examples of inferred protein complexes and their associated diseases. Circular nodes represent proteins and their connecting edges represent protein-protein interactions. Diseases are denoted by square nodes, connected by phenotypic similarity edges. Green dashed edges represent known gene-disease associations; red edges connect a disease to a gene that lies within its associated genomic interval. The complexes were generated for the query diseases (A) Ataxia-Telangiectasia, (B) Hereditary Prostate Cancer type 8 and (C) MOPD-I.

Similar articles

Cited by

References

    1. George RA, Liu JY, Feng LL, Bryson RJ, Fetkin D, et al. Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res. 2006;34:e130. - PMC - PubMed
    1. Perez-Iratxeta C, Bork P, Andrade-Navarro MA. Update of the g2d tool for prioritization of gene candidates to inherited diseases. Nucleic Acids Res. 2007;35:W212–6. - PMC - PubMed
    1. Oti M, Snel B, Huynen MA, Brunner HG. Predicting disease genes using protein-protein interactions. J Med Genet. 2006;43:691–698. - PMC - PubMed
    1. Franke L, Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, et al. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet. 2006;78:1011–1025. - PMC - PubMed
    1. Oti M, Brunner HG. The modular nature of genetic diseases. Clinical Genetics. 2007;71:1–11. - PubMed

Publication types