Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 1;30(1):81-93.
doi: 10.1093/bioinformatics/btt569. Epub 2013 Oct 1.

Functional module identification in protein interaction networks by interaction patterns

Affiliations

Functional module identification in protein interaction networks by interaction patterns

Yijie Wang et al. Bioinformatics. .

Abstract

Motivation: Identifying functional modules in protein-protein interaction (PPI) networks may shed light on cellular functional organization and thereafter underlying cellular mechanisms. Many existing module identification algorithms aim to detect densely connected groups of proteins as potential modules. However, based on this simple topological criterion of 'higher than expected connectivity', those algorithms may miss biologically meaningful modules of functional significance, in which proteins have similar interaction patterns to other proteins in networks but may not be densely connected to each other. A few blockmodel module identification algorithms have been proposed to address the problem but the lack of global optimum guarantee and the prohibitive computational complexity have been the bottleneck of their applications in real-world large-scale PPI networks.

Results: In this article, we propose a novel optimization formulation LCP(2) (low two-hop conductance sets) using the concept of Markov random walk on graphs, which enables simultaneous identification of both dense and sparse modules based on protein interaction patterns in given networks through searching for LCP(2) by random walk. A spectral approximate algorithm SLCP(2) is derived to identify non-overlapping functional modules. Based on a bottom-up greedy strategy, we further extend LCP(2) to a new algorithm (greedy algorithm for LCP(2)) GLCP(2) to identify overlapping functional modules. We compare SLCP(2) and GLCP(2) with a range of state-of-the-art algorithms on synthetic networks and real-world PPI networks. The performance evaluation based on several criteria with respect to protein complex prediction, high level Gene Ontology term prediction and especially sparse module detection, has demonstrated that our algorithms based on searching for LCP(2) outperform all other compared algorithms.

Availability and implementation: All data and code are available at http://www.cse.usf.edu/~xqian/fmi/slcp2hop/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Different module identification results obtained by using P and P2. The first column displays three basic motifs (star motif, clique motif and biclique motif) [used by Royer et al. (2008)] and the dashed lines show the natural partitions. The second column gives the P of three basic motifs and the dashed lines denote the module dividing lines obtained by LCP. The third column gives the minimum objective function values by (2). The fourth column gives the P2 of three basic motifs and the dashed lines indicate the identified modules by LCP2. The fifth column shows the minimum objective function values based on (3). The last column illustrates the second largest eigenvector of formula image used in Algorithm 1
Fig. 2.
Fig. 2.
Performance comparison on synthetic networks: (A) the adjacency matrix of the original network; (B) one example of the randomly shuffled network (obtained by shuffling half of the original edges); (C) GNMI comparison among all algorithms; (D) t-test results
Fig. 3.
Fig. 3.
Statistical performance in detecting each module: (A) Acc scores comparison among all algorithms; (B) t-test results based on distributions of Acc. For low bars in (B), we put the formula image values on top of the bars
Fig. 4.
Fig. 4.
The top bar figure shows the comparison results based on the F measure on four PPI networks. The bottom figure displays the comparison of the percentages of matched GO terms in the complete set of selected high-level GO terms. For the HsaBioGRID PPI network, GS and PG fail to execute due to the memory limitation
Fig. 5.
Fig. 5.
The pro-survival and cytochrome c release modules in HsaBioGRID PPI network detected by all the algorithms (GS and PG fail to execute because of running out of memory). The pro-survial proteins are in rectangle shapes and the cytochrome c release proteins are in circle shapes. Diamond shapes denote the proteins that belong to neither the pro-survial proteins nor the cytochrome c release proteins. Shaded areas represent the modules detected by the corresponding algorithms
Fig. 6.
Fig. 6.
The FGF/FGFR signaling modules in HsaHPRD PPI network detected by all algorithms. FGF proteins are in the circle shapes and FGFR proteins are in the rectangle shapes. Diamond shapes indicate proteins of neither FGF proteins nor FGFR proteins. Shaded areas represent the modules detected by the algorithms

References

    1. Ahn YY, et al. Link communities reveal multiscale complexity in networks. Nature. 2010;466:761–764. - PubMed
    1. Ashburner M, et al. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Bisgin H, Dalfes H. Technical report. Turkey: Informatics Institute, Istanbul Technical University; 2008. Parallel clustering algorithms with application to climatology.
    1. Breitkreutz B, et al. The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008;36:D637–D640. - PMC - PubMed
    1. Brunet J, et al. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA. 2004;101:4164–4169. - PMC - PubMed

Publication types