Selective integration of multiple biological data for supervised network inference
- PMID: 15728114
- DOI: 10.1093/bioinformatics/bti339
Selective integration of multiple biological data for supervised network inference
Abstract
Motivation: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected.
Results: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network.
Availability: Software is available on request.
Similar articles
-
Protein network inference from multiple genomic data: a supervised approach.Bioinformatics. 2004 Aug 4;20 Suppl 1:i363-70. doi: 10.1093/bioinformatics/bth910. Bioinformatics. 2004. PMID: 15262821
-
Supervised enzyme network inference from the integration of genomic data and chemical information.Bioinformatics. 2005 Jun;21 Suppl 1:i468-77. doi: 10.1093/bioinformatics/bti1012. Bioinformatics. 2005. PMID: 15961492
-
Fast protein classification with multiple networks.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii59-65. doi: 10.1093/bioinformatics/bti1110. Bioinformatics. 2005. PMID: 16204126
-
Gene regulatory network inference: data integration in dynamic models-a review.Biosystems. 2009 Apr;96(1):86-103. doi: 10.1016/j.biosystems.2008.12.004. Epub 2008 Dec 27. Biosystems. 2009. PMID: 19150482 Review.
-
Biological Network Inference and analysis using SEBINI and CABIN.Methods Mol Biol. 2009;541:551-76. doi: 10.1007/978-1-59745-243-4_24. Methods Mol Biol. 2009. PMID: 19381531 Review.
Cited by
-
Passing messages between biological networks to refine predicted interactions.PLoS One. 2013 May 31;8(5):e64832. doi: 10.1371/journal.pone.0064832. Print 2013. PLoS One. 2013. PMID: 23741402 Free PMC article.
-
Metabolic network prediction through pairwise rational kernels.BMC Bioinformatics. 2014 Sep 26;15(1):318. doi: 10.1186/1471-2105-15-318. BMC Bioinformatics. 2014. PMID: 25260372 Free PMC article.
-
Learning a Markov Logic network for supervised gene regulatory network inference.BMC Bioinformatics. 2013 Sep 12;14:273. doi: 10.1186/1471-2105-14-273. BMC Bioinformatics. 2013. PMID: 24028533 Free PMC article.
-
Inferring cellular networks--a review.BMC Bioinformatics. 2007 Sep 27;8 Suppl 6(Suppl 6):S5. doi: 10.1186/1471-2105-8-S6-S5. BMC Bioinformatics. 2007. PMID: 17903286 Free PMC article. Review.
-
Methods for biological data integration: perspectives and challenges.J R Soc Interface. 2015 Nov 6;12(112):20150571. doi: 10.1098/rsif.2015.0571. J R Soc Interface. 2015. PMID: 26490630 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources