Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2007:3:88.
doi: 10.1038/msb4100129. Epub 2007 Mar 13.

Network-based prediction of protein function

Affiliations
Review

Network-based prediction of protein function

Roded Sharan et al. Mol Syst Biol. 2007.

Abstract

Functional annotation of proteins is a fundamental problem in the post-genomic era. The recent availability of protein interaction networks for many model species has spurred on the development of computational methods for interpreting such data in order to elucidate protein function. In this review, we describe the current computational approaches for the task, including direct methods, which propagate functional information through the network, and module-assisted methods, which infer functional modules within the network and use those for the annotation task. Although a broad variety of interesting approaches has been developed, further progress in the field will depend on systematic evaluation of the methods and their dissemination in the biological community.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Extent of annotation of proteins in model species. For each species, the charts give the fractions and numbers of annotated and unannotated proteins, according to the three ontologies of the GO annotation. The numbers are based on the Entrez Gene and the WormBase databases as of September 2006.
Figure 2
Figure 2
Direct versus module-assisted approaches for functional annotation. The scheme shows a network in which the functions of some proteins are known (top), where each function is indicated by a different color. Unannotated proteins are in white. In the direct methods (left), these proteins are assigned a color that is unusually prevalent among their neighbors. The direction of the edges indicates the influence of the annotated proteins on the unannotated ones. In the module-assisted methods (right), modules are first identified based on their density. Then, within each module, unannotated proteins are assigned a function that is unusually prevalent in the module. In both methods, proteins may be assigned with several functions.
Figure 3
Figure 3
Correlation between protein functional distance and network distance. X-axis: distance in the network. Y-axis: average functional similarity of protein pairs that lie at the specified distance. The functional similarity of two proteins is measured using the semantic similarity of their GO categories (Lord et al, 2003).
Figure 4
Figure 4
Integration of multiple data sources using the SAMBA framework. In the SAMBA framework (Tanay et al, 2004), different gene characteristics are represented by properties (A). Quantitative characteristics, such as gene expression levels, are discretized first. The genes and the properties are represented by nodes in a bipartite graph (B), where edges connect genes with the properties they have. The SAMBA algorithm seeks modules consisting of a subset of genes and a subset of properties, such that these subsets are densely connected in the graph.
Figure 5
Figure 5
Performance comparison of a direct method versus a module-assisted one. Two receiver operating characteristic (ROC) curves comparing the accuracy of a neighborhood-counting method (Schwikowski et al, 2000) and of the MCODE method (Bader and Hogue, 2003) in predicting GO Biological Process annotations using a PPI network obtained from BioGRID (Stark et al, 2006). A ROC curve is commonly used to assess prediction performance by plotting the true positive rate versus the false positive rate when varying the prediction threshold. In the neighborhood-counting variant used here, a protein is assigned with a function if the hypergeometric enrichment P-value for the function in the protein's direct neighborhood is below a certain threshold. MCODE clusters were obtained using the Cytoscape plug-in with the ‘node score cutoff' parameter set to 0.05 and the other parameters at their default values. Using MCODE, we predict a function for a protein if that function's P-value in the protein's module is below a certain threshold. Each ROC curve was obtained by varying the threshold. Only proteins assigned to at least one MCODE cluster were used in the analysis for both methods.

References

    1. Adamcsek B, Palla G, Farkas IJ, Derenyi I, Vicsek T (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics 22: 1021–1023 - PubMed
    1. Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422: 198–207 - PubMed
    1. Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S (2006) Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics 7: 207. - PMC - PubMed
    1. Arnau V, Mars S, Marin I (2005) Iterative cluster analysis of protein interaction data. Bioinformatics 21: 364–378 - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29 - PMC - PubMed

Publication types