A GMM-IG framework for selecting genes as expression panel biomarkers
- PMID: 20004087
- DOI: 10.1016/j.artmed.2009.07.006
A GMM-IG framework for selecting genes as expression panel biomarkers
Abstract
Objective: The limitation of small sample size of functional genomics experiments has made it necessary to integrate DNA microarray experimental data from different sources. However, experimentation noises and biases of different microarray platforms have made integrated data analysis challenging. In this work, we propose an integrative computational framework to identify candidate biomarker genes from publicly available functional genomics studies.
Methods: We developed a new framework, Gaussian Mixture Modeling-Coupled Information Gain (GMM-IG). In this framework, we first apply a two-component Gaussian mixture model (GMM) to estimate the conditional probability distributions of gene expression data between two different types of samples, for example, normal versus cancer. An expectation-maximization algorithm is then used to estimate the maximum likelihood parameters of a mixture of two Gaussian models in the feature space and determine the underlying expression levels of genes. Gene expression results from different studies are discretized, based on GMM estimations and then unified. Significantly differentially-expressed genes are filtered and assessed with information gain (IG) measures.
Results: DNA microarray experimental data for lung cancers from three different prior studies was processed using the new GMM-IG method. Target gene markers from a gene expression panel were selected and compared with several conventional computational biomarker data analysis methods. GMM-IG showed consistently high accuracy for several classification assessments. A high reproducibility of gene selection results was also determined from statistical validations. Our study shows that the GMM-IG framework can overcome poor reliability issues from single-study DNA microarray experiment while maintaining high accuracies by combining true signals from multiple studies.
Conclusions: We present a conceptually simple framework that enables reliable integration of true differential gene expression signals from multiple microarray experiments. This novel computational method has been shown to generate interesting biomarker panels for lung cancer studies. It is promising as a general strategy for future panel biomarker development, especially for applications that requires integrating experimental results generated from different research centers or with different technology platforms.
2009 Elsevier B.V. All rights reserved.
Similar articles
-
Mixture classification model based on clinical markers for breast cancer prognosis.Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14. Artif Intell Med. 2010. PMID: 20005686
-
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps.Artif Intell Med. 2010 Feb-Mar;48(2-3):91-8. doi: 10.1016/j.artmed.2009.06.001. Epub 2009 Dec 4. Artif Intell Med. 2010. PMID: 19962867
-
Ensemble gene selection by grouping for microarray data classification.J Biomed Inform. 2010 Feb;43(1):81-7. doi: 10.1016/j.jbi.2009.08.010. Epub 2009 Aug 20. J Biomed Inform. 2010. PMID: 19699316
-
Filter versus wrapper gene selection approaches in DNA microarray domains.Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007. Artif Intell Med. 2004. PMID: 15219288 Review.
-
Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5. Prog Brain Res. 2006. PMID: 17027692 Review.
Cited by
-
On integrating multi-experiment microarray data.Philos Trans A Math Phys Eng Sci. 2014 Apr 21;372(2016):20130136. doi: 10.1098/rsta.2013.0136. Print 2014 May 28. Philos Trans A Math Phys Eng Sci. 2014. PMID: 24751870 Free PMC article.
-
The g3mclass is a practical software for multiclass classification on biomarkers.Sci Rep. 2022 Nov 5;12(1):18742. doi: 10.1038/s41598-022-23438-9. Sci Rep. 2022. PMID: 36335194 Free PMC article.
-
Artificial Intelligence in Point-of-Care Biosensing: Challenges and Opportunities.Diagnostics (Basel). 2024 May 25;14(11):1100. doi: 10.3390/diagnostics14111100. Diagnostics (Basel). 2024. PMID: 38893627 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical