. 2008 Sep 18:9:382.

doi: 10.1186/1471-2105-9-382.

A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge

Young-Rae Cho¹, Lei Shi, Murali Ramanathan, Aidong Zhang

Affiliations

PMID: 18801191
PMCID: PMC2570367
DOI: 10.1186/1471-2105-9-382

A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge

Young-Rae Cho et al. BMC Bioinformatics. 2008.

. 2008 Sep 18:9:382.

doi: 10.1186/1471-2105-9-382.

Authors

Young-Rae Cho¹, Lei Shi, Murali Ramanathan, Aidong Zhang

Affiliation

¹ Department of Computer Science and Engineering, State University of New York, Buffalo, NY, USA. ycho8@cse.buffalo.edu

PMID: 18801191
PMCID: PMC2570367
DOI: 10.1186/1471-2105-9-382

Abstract

Background: The functional characterization of newly discovered proteins has been a challenge in the post-genomic era. Protein-protein interactions provide insights into the functional analysis because the function of unknown proteins can be postulated on the basis of their interaction evidence with known proteins. The protein-protein interaction data sets have been enriched by high-throughput experimental methods. However, the functional analysis using the interaction data has a limitation in accuracy because of the presence of the false positive data experimentally generated and the interactions that are a lack of functional linkage.

Results: Protein-protein interaction data can be integrated with the functional knowledge existing in the Gene Ontology (GO) database. We apply similarity measures to assess the functional similarity between interacting proteins. We present a probabilistic framework for predicting functions of unknown proteins based on the functional similarity. We use the leave-one-out cross validation to compare the performance. The experimental results demonstrate that our algorithm performs better than other competing methods in terms of prediction accuracy. In particular, it handles the high false positive rates of current interaction data well.

Conclusion: The experimentally determined protein-protein interactions are erroneous to uncover the functional associations among proteins. The performance of function prediction for uncharacterized proteins can be enhanced by the integration of multiple data sources available.

PubMed Disclaimer

Figures

**Figure 1**
**Distribution of interacting proteins with respect to the (a) structure-based functional similarity, (b) annotation-based functional similarity and (c) functional consistency**. The interaction data from the MIPS, DIP and BioGRID databases were used. The functional categories and annotations were obtained from FunCat in MIPS. (a) The functional similarity of each interacting protein pair was measured by the maximum structure-based similarity of the pair-wise functions they have in a hierarchy. More than 60% of interacting pairs have the functional similarity less than 0.4. (b) The functional similarity of each interacting protein pair was also measured by the maximum annotation-based similarity of the pair-wise functions they have. Around 60% of interacting pairs have the functional similarity less than 0.2. (c) The functional consistency of each interacting protein pair was finally measured by the proportion of the common functions they share. As similar to the distribution in (b), more than 60% of interacting pairs have the functional consistency less than 0.2.

**Figure 2**
**Functional co-occurrence rates of interacting protein pairs sorted by their similarity in a descending order**. We compared three similarity measurements by functional co-occurrence. The interaction data from DIP were used. For each interacting protein pair, we measured the structure-based similarity in Formula 13, the annotation-based similarity in Formula 14 and connectivity-based interaction reliability in [33]. For the structure-based and annotation-based similarity, we used the GO terms and annotations in Biological Process and Molecular Function categories. We then sorted the pairs by their similarity in a descending order, and calculated the average functional co-occurrence rates for every 500 pairs. That is, we inspected how many pairs among 500 pairs co-occurred in the same functional categories from FunCat in MIPS. The interacting pairs with high structure-based and annotation-based similarity have higher rates of functional co-occurrence than those with high connectivity-based reliability. Moreover, in the range of top 4000 pairs, the annotation-based similarity performs better than the structure-based similarity.

**Figure 3**
**Functional consistency of interacting protein pairs sorted by their similarity in a descending order**. We compared the same similarity measurements to those in Figure 2 by functional consistency. Using the sorted interacting pairs by their similarity in a descending order, we calculated the average functional consistency for every 500 pairs. The functional consistency of a pair is computed by the ratio of the number of common functions to the number of all distinct functions that the two proteins have. The general pattern of functional consistency is similar to that of functional co-occurrence in Figure 2. When we use the annotation-based similarity, the functional consistency monotonically decreases as the similarity of interacting pairs declines.

**Figure 4**
**(a) Functional co-occurrence and (b) functional consistency of interacting protein pairs with respect to their functional similarity**. We investigated the functional co-occurrence and functional consistency of the interacting protein pairs from DIP with respect to their functional similarity rates in the range between 0 and 1. The similarity was measured by Formula 13 and 14. As the similarity by both measurements becomes higher, the functional co-occurrence and consistency monotonically increase. However, the annotation-based similarity performs better than the structure-based similarity because there are not enough variations of functional co-occurrence and consistency up to 0.7 of the structure-based similarity. It indicates that the annotation-based method correctly quantified the functional similarity of interacting proteins.

**Figure 5**
**Precision and recall plots by cross-validation for protein function prediction**. The performance of our function prediction algorithm was assessed by the leave-one-out cross-validation using the proteins that appear in the interaction data from DIP and are annotated on the functional categories in MIPS. As a higher threshold of prediction confidence is used, precision increases whereas recall decreases.

**Figure 6**
**Performance comparison of three function prediction methods**. The prediction performance by precision-recall of our functional similarity-based probabilistic approach was compared to that of two competing methods: the FS weighted averaging method and the pattern-based prediction method. The methods could predict the different number of functions for each protein with a selected threshold. Each method then generated several different output sets by varying the threshold. We calculated the precision and recall of each output set. Our approach remarkably outperforms the annotation pattern-based method and has higher precision than the FS weighted averaging method when the recall is greater than 0.07.

**Figure 7**
**Examples of structure-based similarity and annotation-based similarity between functions in a hierarchy**. Each circle represents a function, and each edge is a general-to-specific relationship between two functions. The depth of a function is the path length from the root to the function. The number close to each function represents the number of proteins annotated on the function. The structure-based similarity between two functions is calculated by the ratio of the depth of the most specific common function to the average depth of the functions of interest (Formula 13). The annotation-based similarity between two functions becomes the negative logarithm of the proportion of proteins annotated on the most specific common function (Formula 14). Some examples of the structure-based and the annotation-based similarity between two functions in the hierarchy are shown in the boxes.

See this image and copyright information in PMC

References

1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
1. Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988;85:2444–2448. doi: 10.1073/pnas.85.8.2444. - DOI - PMC - PubMed
1. Altschul SF, Gish W, Miller W, Meyers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. - PubMed
1. Friedberg I. Automated protein function prediction – the genomic challenge. Briefings in Bioinformatics. 2006;7:225–242. doi: 10.1093/bib/bbl004. - DOI - PubMed
1. Valencia A. Automatic annotation of protein function. Current Opinion in Structural Biology. 2005;15:267–274. doi: 10.1016/j.sbi.2005.05.010. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge

Affiliation

A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources