Knowledge-guided multi-scale independent component analysis for biomarker identification
- PMID: 18837990
- PMCID: PMC2576264
- DOI: 10.1186/1471-2105-9-416
Knowledge-guided multi-scale independent component analysis for biomarker identification
Abstract
Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data.
Results: Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification.
Conclusion: We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.
Figures















Similar articles
-
Biomarker identification by knowledge-driven multilevel ICA and motif analysis.Int J Data Min Bioinform. 2009;3(4):365-81. doi: 10.1504/ijdmb.2009.029201. Int J Data Min Bioinform. 2009. PMID: 20052902
-
Knowledge-guided gene ranking by coordinative component analysis.BMC Bioinformatics. 2010 Mar 30;11:162. doi: 10.1186/1471-2105-11-162. BMC Bioinformatics. 2010. PMID: 20353603 Free PMC article.
-
Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data.BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S7. doi: 10.1186/1471-2105-12-S5-S7. Epub 2011 Jul 27. BMC Bioinformatics. 2011. PMID: 21989140 Free PMC article.
-
A review of independent component analysis application to microarray gene expression data.Biotechniques. 2008 Nov;45(5):501-20. doi: 10.2144/000112950. Biotechniques. 2008. PMID: 19007336 Free PMC article. Review.
-
A review of feature extraction software for microarray gene expression data.Biomed Res Int. 2014;2014:213656. doi: 10.1155/2014/213656. Epub 2014 Aug 31. Biomed Res Int. 2014. PMID: 25250315 Free PMC article. Review.
Cited by
-
Glycated lysine-141 in haptoglobin improves the diagnostic accuracy for type 2 diabetes mellitus in combination with glycated hemoglobin HbA1c and fasting plasma glucose.Clin Proteomics. 2017 Mar 28;14:10. doi: 10.1186/s12014-017-9145-1. eCollection 2017. Clin Proteomics. 2017. PMID: 28360826 Free PMC article.
-
ADAGE signature analysis: differential expression analysis with data-defined gene sets.BMC Bioinformatics. 2017 Nov 22;18(1):512. doi: 10.1186/s12859-017-1905-4. BMC Bioinformatics. 2017. PMID: 29166858 Free PMC article.
-
A minimal connected network of transcription factors regulated in human tumors and its application to the quest for universal cancer biomarkers.PLoS One. 2012;7(6):e39666. doi: 10.1371/journal.pone.0039666. Epub 2012 Jun 25. PLoS One. 2012. PMID: 22761861 Free PMC article.
-
Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks.Cell Syst. 2017 Jul 26;5(1):63-71.e6. doi: 10.1016/j.cels.2017.06.003. Epub 2017 Jul 12. Cell Syst. 2017. PMID: 28711280 Free PMC article.
-
Independent component analysis: mining microarray data for fundamental human gene expression modules.J Biomed Inform. 2010 Dec;43(6):932-44. doi: 10.1016/j.jbi.2010.07.001. Epub 2010 Jul 7. J Biomed Inform. 2010. PMID: 20619355 Free PMC article.
References
-
- Devore J, Peck R. Statistics: The Exploration and Analysis of Data. CA Duxbury Press; 1997.
-
- Hartigan JA, Wong MA. A K-means clustering algorithm. App Statist. 1978;28:100–108. doi: 10.2307/2346830. - DOI
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous