Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 14:10:419.
doi: 10.1186/1471-2105-10-419.

Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences

Affiliations

Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences

Yungki Park. BMC Bioinformatics. .

Abstract

Background: Protein-protein interactions underlie many important biological processes. Computational prediction methods can nicely complement experimental approaches for identifying protein-protein interactions. Recently, a unique category of sequence-based prediction methods has been put forward--unique in the sense that it does not require homologous protein sequences. This enables it to be universally applicable to all protein sequences unlike many of previous sequence-based prediction methods. If effective as claimed, these new sequence-based, universally applicable prediction methods would have far-reaching utilities in many areas of biology research.

Results: Upon close survey, I realized that many of these new methods were ill-tested. In addition, newer methods were often published without performance comparison with previous ones. Thus, it is not clear how good they are and whether there are significant performance differences among them. In this study, I have implemented and thoroughly tested 4 different methods on large-scale, non-redundant data sets. It reveals several important points. First, significant performance differences are noted among different methods. Second, data sets typically used for training prediction methods appear significantly biased, limiting the general applicability of prediction methods trained with them. Third, there is still ample room for further developments. In addition, my analysis illustrates the importance of complementary performance measures coupled with right-sized data sets for meaningful benchmark tests.

Conclusions: The current study reveals the potentials and limits of the new category of sequence-based protein-protein interaction prediction methods, which in turn provides a firm ground for future endeavours in this important area of contemporary bioinformatics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Cross-validation on the yeast and the human data. ROC and recall-precision plots for the four tested methods and the consensus method. The title of each plot is of the 'D1_D2_(ROC|RP)100' format, where D1 is the training data set, D2 is the testing data set, and '100' indicates that the size of the negative test data set is 100 times that of the positive test data set. 'ROC' indicates an ROC plot whereas 'RP' indicates a recall-precision plot. When D1 and D2 are identical as is here, 4-fold cross-validation was performed. When not identical, D1 and D2 are used for training and testing, respectively.
Figure 2
Figure 2
Cross-species benchmarking results. The title of each plot reads in the same way as in Fig. 1.
Figure 3
Figure 3
Cross-validation on the combined data. The title of each plot reads in the same way as in Fig. 1.

Similar articles

Cited by

References

    1. Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol. 2007;3:88. doi: 10.1038/msb4100129. - DOI - PMC - PubMed
    1. Levy ED, Pereira-Leal JB. Evolution and dynamics of protein interactions and networks. Curr Opin Struct Biol. 2008;18(3):349–357. doi: 10.1016/j.sbi.2008.03.003. - DOI - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403(6770):623–627. doi: 10.1038/35001009. - DOI - PubMed
    1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Sci Acad USA. 2001;98(8):4569–4574. doi: 10.1073/pnas.061034498. - DOI - PMC - PubMed
    1. Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N. High-Quality Binary Protein Interaction Map of the Yeast Interactome Network. Science. 2008;322(5898):104–110. doi: 10.1126/science.1158684. - DOI - PMC - PubMed

Publication types

LinkOut - more resources