Network-based prediction of protein function

Roded Sharan¹, Igor Ulitsky, Ron Shamir

Affiliations

PMID: 17353930
PMCID: PMC1847944
DOI: 10.1038/msb4100129

Review

Network-based prediction of protein function

Roded Sharan et al. Mol Syst Biol. 2007.

. 2007:3:88.

doi: 10.1038/msb4100129. Epub 2007 Mar 13.

Authors

Roded Sharan¹, Igor Ulitsky, Ron Shamir

Affiliation

¹ School of Computer Science, Tel Aviv University, Tel Aviv, Israel.

PMID: 17353930
PMCID: PMC1847944
DOI: 10.1038/msb4100129

Abstract

Functional annotation of proteins is a fundamental problem in the post-genomic era. The recent availability of protein interaction networks for many model species has spurred on the development of computational methods for interpreting such data in order to elucidate protein function. In this review, we describe the current computational approaches for the task, including direct methods, which propagate functional information through the network, and module-assisted methods, which infer functional modules within the network and use those for the annotation task. Although a broad variety of interesting approaches has been developed, further progress in the field will depend on systematic evaluation of the methods and their dissemination in the biological community.

PubMed Disclaimer

Figures

**Figure 1**
Extent of annotation of proteins in model species. For each species, the charts give the fractions and numbers of annotated and unannotated proteins, according to the three ontologies of the GO annotation. The numbers are based on the Entrez Gene and the WormBase databases as of September 2006.

**Figure 2**
Direct versus module-assisted approaches for functional annotation. The scheme shows a network in which the functions of some proteins are known (top), where each function is indicated by a different color. Unannotated proteins are in white. In the direct methods (left), these proteins are assigned a color that is unusually prevalent among their neighbors. The direction of the edges indicates the influence of the annotated proteins on the unannotated ones. In the module-assisted methods (right), modules are first identified based on their density. Then, within each module, unannotated proteins are assigned a function that is unusually prevalent in the module. In both methods, proteins may be assigned with several functions.

**Figure 3**
Correlation between protein functional distance and network distance. X-axis: distance in the network. Y-axis: average functional similarity of protein pairs that lie at the specified distance. The functional similarity of two proteins is measured using the semantic similarity of their GO categories (Lord *et al*, 2003).

**Figure 4**
Integration of multiple data sources using the SAMBA framework. In the SAMBA framework (Tanay *et al*, 2004), different gene characteristics are represented by properties (A). Quantitative characteristics, such as gene expression levels, are discretized first. The genes and the properties are represented by nodes in a bipartite graph (B), where edges connect genes with the properties they have. The SAMBA algorithm seeks modules consisting of a subset of genes and a subset of properties, such that these subsets are densely connected in the graph.

**Figure 5**
Performance comparison of a direct method versus a module-assisted one. Two receiver operating characteristic (ROC) curves comparing the accuracy of a neighborhood-counting method (Schwikowski *et al*, 2000) and of the MCODE method (Bader and Hogue, 2003) in predicting GO Biological Process annotations using a PPI network obtained from BioGRID (Stark *et al*, 2006). A ROC curve is commonly used to assess prediction performance by plotting the true positive rate versus the false positive rate when varying the prediction threshold. In the neighborhood-counting variant used here, a protein is assigned with a function if the hypergeometric enrichment P-value for the function in the protein's direct neighborhood is below a certain threshold. MCODE clusters were obtained using the Cytoscape plug-in with the ‘node score cutoff' parameter set to 0.05 and the other parameters at their default values. Using MCODE, we predict a function for a protein if that function's P-value in the protein's module is below a certain threshold. Each ROC curve was obtained by varying the threshold. Only proteins assigned to at least one MCODE cluster were used in the analysis for both methods.

See this image and copyright information in PMC

References

1. Adamcsek B, Palla G, Farkas IJ, Derenyi I, Vicsek T (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics 22: 1021–1023 - PubMed
1. Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422: 198–207 - PubMed
1. Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S (2006) Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics 7: 207. - PMC - PubMed
1. Arnau V, Mars S, Marin I (2005) Iterative cluster analysis of protein interaction data. Bioinformatics 21: 364–378 - PubMed
1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29 - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Network-based prediction of protein function

Affiliation

Network-based prediction of protein function

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources