Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Apr 14:7:207.
doi: 10.1186/1471-2105-7-207.

Development and implementation of an algorithm for detection of protein complexes in large interaction networks

Affiliations

Development and implementation of an algorithm for detection of protein complexes in large interaction networks

Md Altaf-Ul-Amin et al. BMC Bioinformatics. .

Abstract

Background: After complete sequencing of a number of genomes the focus has now turned to proteomics. Advanced proteomics technologies such as two-hybrid assay, mass spectrometry etc. are producing huge data sets of protein-protein interactions which can be portrayed as networks, and one of the burning issues is to find protein complexes in such networks. The enormous size of protein-protein interaction (PPI) networks warrants development of efficient computational methods for extraction of significant complexes.

Results: This paper presents an algorithm for detection of protein complexes in large interaction networks. In a PPI network, a node represents a protein and an edge represents an interaction. The input to the algorithm is the associated matrix of an interaction network and the outputs are protein complexes. The complexes are determined by way of finding clusters, i. e. the densely connected regions in the network. We also show and analyze some protein complexes generated by the proposed algorithm from typical PPI networks of Escherichia coli and Saccharomyces cerevisiae. A comparison between a PPI and a random network is also performed in the context of the proposed algorithm.

Conclusion: The proposed algorithm makes it possible to detect clusters of proteins in PPI networks which mostly represent molecular biological functional units. Therefore, protein complexes determined solely based on interaction data can help us to predict the functions of proteins, and they are also useful to understand and explain certain biological processes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Concepts of the proposed method. (a) and (b) Two typical graphs of the same size and density. (c) A typical cluster and its neighbors. (d) Flow-chart of the proposed algorithm.
Figure 2
Figure 2
A protein-protein interaction network of E. coli showing high-density complexes of size ≥ 3 detected by the proposed algorithm. Functions of the proteins of 10 largest complexes are as follows: complex 1, components of RNA polymerase (RpoA, RpoB, RpoC, Rsd, RpoZ RpoD, RpoN, FliA); complex 2, components of ATP synthetase (AtpA, AtpB, AtpE, AtpF, AtpG, AtpH, AtpL); complex 3, components of DNA polymerase (DnaX, HolA, HolB, HolD, and HolC); complex 4, proteins involved in cell division (FtsQ, FtsI, FtsW, FtsN, FtsK and FtsL); complex 5, components of ribonucleoside-diphosphate reductase (NrdA, NrdB, NrdE, NrdF); complex 6, chaperons (DnaK, GrpE, DnaI) and a heat-shock sigma factor (RpoH), complex 7, component of protein translocase (SecA, SecE, SecY, SecG); complex 8, DNA mismatch repair proteins (MutL, MutS, MutH, UvrD); complex 9, components of fumarate reductase (FrdA, FrdB, FrdC, FrdD); and complex 10, proteins associated with molybdopterin biosynthesis (MoeA, MoeB, MobB, Mog). 11 to 22 are 3-protein complexes, which mostly consist of similar function proteins.
Figure 3
Figure 3
Comparison of yeast PPI network with a random network in the context of (a) degree distribution, (b) total number of complexes, (c) distribution of the complexes generated using din = 0.7 with respect to size, (d) size of the biggest complex, (e) the number of complexes of size ≥ 3 and (f) the average size of the complexes of size ≥ 3. We generated 10 overlapping (Ov) and 10 non-overlapping (NO) sets of complexes by using 10 different values of din for both the networks. We then calculated the aforementioned quantities for each of the generated sets and plotted them.
Figure 4
Figure 4
The effect of cpin on clustering. Twenty sets of non-overlapping clusters are generated from the yeast PPI using cpin = 0.1, 0.3, 0.5, 0.7, 0.9 and for each of these values of cpin using din = 0.1, 0.6, 0.7, 0.9. We then calculated and plotted (a) the total number of clusters, (b) the number of clusters of size ≥ 3, (c) the average size of the clusters of size ≥ 3, and (d) the size of the biggest cluster with respect to cpin.
Figure 5
Figure 5
Comparison of predicted complexes with known complexes. (a) Distribution of known complexes with respect to size. (b) Number of matched known complexes with respect to the minimum overlapping score for all 20 sets. The triangular dots (connected by dotted line) show the average and the solid square dots (connected by thick solid line) show the union of the matched known complexes. (c) Number of matched known complexes with respect to number of predicted complexes for 20 sets. Each point is labeled by corresponding mode of clustering, i. e. non-overlapping (NO) or overlapping (Ov) and din value.
Figure 6
Figure 6
Relative amount of interactions involving similar function protein pairs in 20 sets of yeast complexes against corresponding din values.
Figure 7
Figure 7
Complexes that are of size ≥ 6 of the set generated using din = 0.7, cpin = 0.50 and non-overlapping mode. (a) ID, size N, density d, p-value (with Bonferroni correction), a corresponding histogram and the names of the constituent proteins of each complex. The histogram of a complex shows the distribution of its member proteins with respect to 15 functional classes: (1) Cell cycle and DNA processing, (2) Protein with binding function or cofactor requirement (structural or catalytic), (3) Protein fate (folding, modification, destination), (4) Biogenesis of cellular components, (5) Cellular transport, transport facilitation and transport routes, (6) Metabolism, (7) Interaction with the cellular environment, (8) Transcription, (9) Energy, (10) Cell rescue, defense and virulence, (11) Cell type differentiation, (12) Cellular communication/signal transduction mechanism, (13) Protein activity regulation, (14) Protein synthesis, and (15) Transposable elements, viral and plasmid proteins. A protein may belong to more than one functional class. The scaling of a histogram is according to the size of the corresponding complex. The highest column/columns of a histogram are colored as red. (b) The network of the complex 19 of Fig 7(a).

References

    1. Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4:2. doi: 10.1186/1471-2105-4-2. - DOI - PMC - PubMed
    1. Schwikowski B, Utez P, Fields S. A Network of protein-protein interactions in Yeast. Nature Biotechnology. 2000;18:1257–1261. doi: 10.1038/82360. - DOI - PubMed
    1. Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Höfert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. doi: 10.1038/415141a. - DOI - PubMed
    1. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. doi: 10.1038/415180a. - DOI - PubMed
    1. Seidman SB. Network structure and minimum degree. Social Networks. 1983;5:269–287. doi: 10.1016/0378-8733(83)90028-X. - DOI

MeSH terms

LinkOut - more resources