Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan 15;25(2):243-50.
doi: 10.1093/bioinformatics/btn602. Epub 2008 Nov 17.

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Affiliations

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Kevin Y Yip et al. Bioinformatics. .

Abstract

Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While there are many computational techniques proposed for this network reconstruction task, their accuracy is consistently limited by the small number of high-confidence examples, and the uneven distribution of these examples across the potential interaction space, with some objects having many known interactions and others few.

Results: To address this issue, we propose two computational methods based on the concept of training set expansion. They work particularly effectively in conjunction with kernel approaches, which are a popular class of approaches for fusing together many disparate types of features. Both our methods are based on semi-supervised learning and involve augmenting the limited number of gold-standard training instances with carefully chosen and highly confident auxiliary examples. The first method, prediction propagation, propagates highly confident predictions of one local model to another as the auxiliary examples, thus learning from information-rich regions of the training network to help predict the information-poor regions. The second method, kernel initialization, takes the most similar and most dissimilar objects of each object in a global kernel as the auxiliary examples. Using several sets of experimentally verified protein-protein interactions from yeast, we show that training set expansion gives a measurable performance gain over a number of representative, state-of-the-art network reconstruction methods, and it can correctly identify some interactions that are ranked low by other methods due to the lack of training examples of the involved proteins.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The supervised network inference problem. (a) Adjacency matrix of known interactions (black boxes), known non-interactions (white boxes) and node pairs with an unknown interaction status (gray boxes with question marks). (b) Kernel matrix, with a darker color representing a larger inner product. (c) Partially complete adjacency matrix required by the supervised direct approach methods, with complete knowledge of a submatrix. In the basic local modeling approach, the dark gray portion cannot be predicted.
Fig. 2.
Fig. 2.
Global and local modeling. (a) An interaction network with each green solid edge representing a known interaction, each red dotted edge representing a known non-interaction and the dashed edge representing a pair of objects with an unknown interaction status. (b) A global model based on a Pkernel. (c) A local model for object v3.
Fig. 3.
Fig. 3.
Prediction accuracy at different gold-standard set sizes. (a) Using int kernel. (b) Using exp-gasch kernel.
Fig. 4.
Fig. 4.
Correlating the number of gold-standard examples and the rank difference between local+PP and the four methods.

Similar articles

Cited by

References

    1. Aizerman M, et al. Theoretical foundations of the potential function method in pattern recognition learning. Automat. Rem. Contr. 1964;25:821–837.
    1. Ben-Hur A, Noble WS. Kernel methods for predicting protein-protein interactions. Bioinformatics. 2005;21(Suppl. 1):i38–i46. - PubMed
    1. Bleakley K, et al. Supervised reconstruction of biological networks with local models. Bioinformatics. 2007;23:i57–i65. - PubMed
    1. Blum A, Mitchell T. The Eleventh Annual Workshop on Computational Learning Theory. Vol. 92. San Francisco, California, USA: Morgan Kaufmann Publishers; 1998. Combining labeled and unlabeled data with co-training. 100 pp.
    1. Chang C-C, Lin C-J. LIBSVM: a library for support vector machine. 2008. [(last accessed date on October 2008)]. Available at http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.

Publication types