Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 29;6(1):e1000659.
doi: 10.1371/journal.pcbi.1000659.

Protein interaction networks--more than mere modules

Affiliations

Protein interaction networks--more than mere modules

Stefan Pinkert et al. PLoS Comput Biol. .

Abstract

It is widely believed that the modular organization of cellular function is reflected in a modular structure of molecular networks. A common view is that a "module" in a network is a cohesively linked group of nodes, densely connected internally and sparsely interacting with the rest of the network. Many algorithms try to identify functional modules in protein-interaction networks (PIN) by searching for such cohesive groups of proteins. Here, we present an alternative approach independent of any prior definition of what actually constitutes a "module". In a self-consistent manner, proteins are grouped into "functional roles" if they interact in similar ways with other proteins according to their functional roles. Such grouping may well result in cohesive modules again, but only if the network structure actually supports this. We applied our method to the PIN from the Human Protein Reference Database (HPRD) and found that a representation of the network in terms of cohesive modules, at least on a global scale, does not optimally represent the network's structure because it focuses on finding independent groups of proteins. In contrast, a decomposition into functional roles is able to depict the structure much better as it also takes into account the interdependencies between roles and even allows groupings based on the absence of interactions between proteins in the same functional role. This, for example, is the case for transmembrane proteins, which could never be recognized as a cohesive group of nodes in a PIN. When mapping experimental methods onto the groups, we identified profound differences in the coverage suggesting that our method is able to capture experimental bias in the data, too. For example yeast-two-hybrid data were highly overrepresented in one particular group. Thus, there is more structure in protein-interaction networks than cohesive modules alone and we believe this finding can significantly improve automated function prediction algorithms.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. An example network and possible image graphs.
A A simple example network of nodes of 4 different types identified by their structural position. Nodes of types formula image and formula image are densely connected among themselves. The nodes of type formula image have connections to both nodes of types formula image and formula image, but not among themselves, i.e. they mediate between types formula image and formula image. The nodes of type formula image only have connections to nodes of type formula image, but not among each other, i.e. they form a periphery to type formula image nodes. B and C Two possible image graphs for the functional understanding of this network show the connections among groups of nodes. A typical network clustering will aggregate nodes into clusters densely connected internally but only sparsely connected to the rest, as depicted in the left image graph. This will result in grouping nodes of types formula image and formula image together and nodes of type formula image and formula image together. Because of aggregating nodes into cohesive groups, any such algorithm will never recognize nodes of type formula image and formula image as different and hence miss essential part of the network's structure. On the opposite, the right image graph correctly captures the network structure of the 4 different types as the 4 different nodes in the image graph. D and E The adjacency matrices of our example network with rows and columns ordered according to the two decompositions shown above. A black square in position formula image indicates the existence of a link connecting node formula image with node formula image. Rows and columns are ordered such that nodes in the same group are adjacent. The internal order of the nodes in the groups is random. Each block in the matrix corresponds to a possible edge in the image graph. The left matrix shows the adjacency matrix for the output of a typical clustering algorithm which groups nodes of type formula image and formula image, as well as formula image and formula image together. Clearly, we see dense blocks along the diagonal and sparse blocks on the off-diagonal of the matrix as expected. The right matrix depicts the adjacency matrix with rows and columns according to the actual types of the nodes. All empty blocks in this matrix correspond to a missing edge in the image graph and all populated blocks are represented by an edge in the image graph. We see that for this network, the image graph perfectly captures the structure of the network.
Figure 2
Figure 2. Benchmark on networks with known role structure.
We compare our method with a mixture model approach by Newman and Leicht (NL) which employs a maximum likelihood approach , a non-negative matrix factorization (NMF) minimizing the Kullback-Leibler-divergence between data and estimated factors , and an approach based on minimum description length by Rosvall and Bergstrom (InfoMod) . The adjacency matrices show typical realizations of the test networks with rows and columns ordered according to the designed classes. Accuracy is measured in terms of normalized mutual information (NMI) between the designed assignment of nodes into classes and the classification inferred by the algorithms. Clearly, our approach outperforms the alternatives, in particular for high noise levels.
Figure 3
Figure 3. Fit scores and generalization error.
A Comparison of highest fit scores formula image (4) and formula image (5) for the full HPRD dataset with 32,331 interactions. Aggregating nodes into cohesive groups (diagonal image graphs) cannot improve the score beyond a certain limit, while non-diagonal image graphs are able to capture more and more structure as the image graph gets larger and larger. For comparison, the analysis was repeated on a randomized (RND) version of the original network. Standard deviation is smaller than symbol size. The fit scores we obtain on the real data show that the structure we find is far from random. B After removing a test-set of links from the network, we optimized the assignment of nodes into classes according to (2) using only the remaining links and keeping the image graphs fixed to those found in the runs that lead to figure A. With the assignment of nodes into classes for this training set of links, we computed the score on the test set of links. The figure shows average and standard deviation over 10 repetitions of this experiment. C p-values of Student's t-test for a statistically significant difference in the means of the test scores of panel B. For higher numbers of classes and thus larger differences in the fit scores of diagonal and non-diagonal image graphs, all differences become significant at the formula image level.
Figure 4
Figure 4. Number of classes not enriched in GO-terms with high significance.
A Number of classes not significantly enriched below the formula image level after Bonferroni-correction in any of the GO categories biological process, molecular function or cellular compartment. B Number of classes not significantly enriched below the formula image level after Bonferroni correction in at least one the GO categories biological process (BP), molecular function (MF) or cellular compartment (CC). Note the different scales. Clearly, the non-diagonal models consistently produce a lower number of classes which are not enriched in functional annotation. This can be seen as an indication that the non-diagonal models not only represent the network better, but the inferred groups also correspond better to known biology.
Figure 5
Figure 5. Comparison of block assignment.
For formula image classes, we show the adjacency matrix of the HPRD protein interaction network with rows and column ordered to show non-diagonal (A) and diagonal (B) block structure plus the corresponding image graphs for diagonal block models and non-diagonal block models. Note how the non-diagonal models allow to capture overlap between cohesive blocks but also to detect groups of nodes which are non-cohesive but have similar connection patterns to other classes of proteins. The color of the links codes the experiment type: Y2H: grey, in-vitro: blue, in-vitro+Y2H: turquoise, in-vivo: green, in-vivo+Y2H:orange, in-vivo+in-vitro: red, in-vivo+in-vitro+Y2H:black. The dots representing the matrix entries have been enlarged for better visibility.
Figure 6
Figure 6. Block assignment in a functional role model with 100 classes.
Adjacency matrix with rows and columns ordered according to assignment of proteins into classes. Color code and size of dots representing matrix entries as in figure 5. Only classes containing more than 100 proteins are labeled for better readability. Two details from the corresponding image graph exemplifying the kinds of structures found by the algorithm.
Figure 7
Figure 7. Comparison of block assignment.
The same assignment of nodes into formula image classes as used in figure 5 but for 3 different types of interactions, separately. A Interactions reported only for yeast-2-hybrid experiments (grey). B Interactions reported only in in-vitro experiments (blue). C Interactions reported only in in-vivo experiments (green). While in-vitro and in-vivo data is highly correlated, the interactions found in Y2H experiments are enriched in class 8.
Figure 8
Figure 8. Fit-score as a function of link weight.
Averaged over formula image to formula image, formula image and formula image, we show the fit scores formula image and formula image for each link type individually. Scores are normalized to the fit score obtained on the full data set from which also the assignment of nodes into classes and the image graphs are taken. A Actual HPRD data. Standard deviations are smaller than symbol sizes. B Randomized version of HPRD. As expected, we find the score increasing with weight. In the real data, increase of score with weight is slower, indicating a high correlation between the scores obtained for links representing different experimental techniques. As an exception, interactions with weight one, i.e. representing to Y2H-data only, show a significantly lower score than expected. See text for details.

Similar articles

Cited by

References

    1. Barabási AL, Oltvai Z. Network biology: Understanding the cells's functional organization. Nature Reviews Genetics. 2004;5:101–113. - PubMed
    1. Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Molecular Systems Biology. 2007;3:88. - PMC - PubMed
    1. Oliver S. Guilt-by-association goes global. Nature. 2000;403:601–603. - PubMed
    1. Spirin V, Mirny L. Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA. 2003;100:12123–12128. - PMC - PubMed
    1. Cui G, Chen Y, Huang D, Han K. An algorithm for finding functional modules and protein complexes in protein-protein interaction networks. J Biomed Biotechnol. 2008;2008:860270. - PMC - PubMed

Publication types