. 2009 Jan 15;25(2):243-50.

doi: 10.1093/bioinformatics/btn602. Epub 2008 Nov 17.

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Kevin Y Yip¹, Mark Gerstein

Affiliations

PMID: 19015141
PMCID: PMC2639005
DOI: 10.1093/bioinformatics/btn602

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Kevin Y Yip et al. Bioinformatics. 2009.

. 2009 Jan 15;25(2):243-50.

doi: 10.1093/bioinformatics/btn602. Epub 2008 Nov 17.

Authors

Kevin Y Yip¹, Mark Gerstein

Affiliation

¹ Department of Computer Science, Yale University, New Haven, CT 06511, USA.

PMID: 19015141
PMCID: PMC2639005
DOI: 10.1093/bioinformatics/btn602

Abstract

Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While there are many computational techniques proposed for this network reconstruction task, their accuracy is consistently limited by the small number of high-confidence examples, and the uneven distribution of these examples across the potential interaction space, with some objects having many known interactions and others few.

Results: To address this issue, we propose two computational methods based on the concept of training set expansion. They work particularly effectively in conjunction with kernel approaches, which are a popular class of approaches for fusing together many disparate types of features. Both our methods are based on semi-supervised learning and involve augmenting the limited number of gold-standard training instances with carefully chosen and highly confident auxiliary examples. The first method, prediction propagation, propagates highly confident predictions of one local model to another as the auxiliary examples, thus learning from information-rich regions of the training network to help predict the information-poor regions. The second method, kernel initialization, takes the most similar and most dissimilar objects of each object in a global kernel as the auxiliary examples. Using several sets of experimentally verified protein-protein interactions from yeast, we show that training set expansion gives a measurable performance gain over a number of representative, state-of-the-art network reconstruction methods, and it can correctly identify some interactions that are ranked low by other methods due to the lack of training examples of the involved proteins.

PubMed Disclaimer

Figures

**Fig. 1.**
The supervised network inference problem. (a) Adjacency matrix of known interactions (black boxes), known non-interactions (white boxes) and node pairs with an unknown interaction status (gray boxes with question marks). (b) Kernel matrix, with a darker color representing a larger inner product. (c) Partially complete adjacency matrix required by the supervised direct approach methods, with complete knowledge of a submatrix. In the basic local modeling approach, the dark gray portion cannot be predicted.

**Fig. 2.**
Global and local modeling. (a) An interaction network with each green solid edge representing a known interaction, each red dotted edge representing a known non-interaction and the dashed edge representing a pair of objects with an unknown interaction status. (b) A global model based on a Pkernel. (c) A local model for object v₃.

**Fig. 3.**
Prediction accuracy at different gold-standard set sizes. (a) Using int kernel. (b) Using exp-gasch kernel.

**Fig. 4.**
Correlating the number of gold-standard examples and the rank difference between local+PP and the four methods.

See this image and copyright information in PMC

Cited by

Network clustering: probing biological heterogeneity by sparse graphical models.
Mukherjee S, Hill SM. Mukherjee S, et al. Bioinformatics. 2011 Apr 1;27(7):994-1000. doi: 10.1093/bioinformatics/btr070. Epub 2011 Feb 10. Bioinformatics. 2011. PMID: 21317141 Free PMC article.
Automated Detection of Acute Myocardial Infarction Using Asynchronous Electrocardiogram Signals-Preview of Implementing Artificial Intelligence With Multichannel Electrocardiographs Obtained From Smartwatches: Retrospective Study.
Han C, Song Y, Lim HS, Tae Y, Jang JH, Lee BT, Lee Y, Bae W, Yoon D. Han C, et al. J Med Internet Res. 2021 Sep 10;23(9):e31129. doi: 10.2196/31129. J Med Internet Res. 2021. PMID: 34505839 Free PMC article.
Triangle network motifs predict complexes by complementing high-error interactomes with structural information.
Andreopoulos B, Winter C, Labudde D, Schroeder M. Andreopoulos B, et al. BMC Bioinformatics. 2009 Jun 27;10:196. doi: 10.1186/1471-2105-10-196. BMC Bioinformatics. 2009. PMID: 19558694 Free PMC article.
Rising Strengths Hong Kong SAR in Bioinformatics.
Chakraborty C, George Priya Doss C, Zhu H, Agoramoorthy G. Chakraborty C, et al. Interdiscip Sci. 2017 Jun;9(2):224-236. doi: 10.1007/s12539-016-0147-x. Epub 2016 Mar 9. Interdiscip Sci. 2017. PMID: 26961385 Free PMC article. Review.
Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins.
Qi Y, Tastan O, Carbonell JG, Klein-Seetharaman J, Weston J. Qi Y, et al. Bioinformatics. 2010 Sep 15;26(18):i645-52. doi: 10.1093/bioinformatics/btq394. Bioinformatics. 2010. PMID: 20823334 Free PMC article.

See all "Cited by" articles

References

1. Aizerman M, et al. Theoretical foundations of the potential function method in pattern recognition learning. Automat. Rem. Contr. 1964;25:821–837.
1. Ben-Hur A, Noble WS. Kernel methods for predicting protein-protein interactions. Bioinformatics. 2005;21(Suppl. 1):i38–i46. - PubMed
1. Bleakley K, et al. Supervised reconstruction of biological networks with local models. Bioinformatics. 2007;23:i57–i65. - PubMed
1. Blum A, Mitchell T. The Eleventh Annual Workshop on Computational Learning Theory. Vol. 92. San Francisco, California, USA: Morgan Kaufmann Publishers; 1998. Combining labeled and unlabeled data with co-training. 100 pp.
1. Chang C-C, Lin C-J. LIBSVM: a library for support vector machine. 2008. [(last accessed date on October 2008)]. Available at http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Affiliation

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Molecular Biology Databases