Cancer classification and prediction using logistic regression with Bayesian gene selection
- PMID: 15465478
- DOI: 10.1016/j.jbi.2004.07.009
Cancer classification and prediction using logistic regression with Bayesian gene selection
Abstract
In microarray-based cancer classification and prediction, gene selection is an important research problem owing to the large number of genes and the small number of experimental conditions. In this paper, we propose a Bayesian approach to gene selection and classification using the logistic regression model. The basic idea of our approach is in conjunction with a logistic regression model to relate the gene expression with the class labels. We use Gibbs sampling and Markov chain Monte Carlo (MCMC) methods to discover important genes. To implement Gibbs Sampler and MCMC search, we derive a posterior distribution of selected genes given the observed data. After the important genes are identified, the same logistic regression model is then used for cancer classification and prediction. Issues for efficient implementation for the proposed method are discussed. The proposed method is evaluated against several large microarray data sets, including hereditary breast cancer, small round blue-cell tumors, and acute leukemia. The results show that the method can effectively identify important genes consistent with the known biological findings while the accuracy of the classification is also high. Finally, the robustness and sensitivity properties of the proposed method are also investigated.
Similar articles
-
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14. Bioinformatics. 2006. PMID: 16844704
-
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18. Bioinformatics. 2006. PMID: 16709589
-
Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis.J Biomed Inform. 2004 Aug;37(4):240-8. doi: 10.1016/j.jbi.2004.07.003. J Biomed Inform. 2004. PMID: 15465477
-
Microarray-based cancer diagnosis with artificial neural networks.Biotechniques. 2003 Mar;Suppl:30-5. Biotechniques. 2003. PMID: 12664682 Review.
-
Diagnostic classification of cancer using DNA microarrays and artificial intelligence.Ann N Y Acad Sci. 2004 May;1020:49-66. doi: 10.1196/annals.1310.007. Ann N Y Acad Sci. 2004. PMID: 15208183 Review.
Cited by
-
An AI-driven clinical care pathway to reduce 30-day readmission for chronic obstructive pulmonary disease (COPD) patients.Sci Rep. 2022 Nov 30;12(1):20633. doi: 10.1038/s41598-022-22434-3. Sci Rep. 2022. PMID: 36450795 Free PMC article.
-
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67. BMC Bioinformatics. 2007. PMID: 17328811 Free PMC article.
-
Mathematical Modelling of Cervical Precancerous Lesion Grade Risk Scores: Linear Regression Analysis of Cellular Protein Biomarkers and Human Papillomavirus E6/E7 RNA Staining Patterns.Diagnostics (Basel). 2023 Mar 13;13(6):1084. doi: 10.3390/diagnostics13061084. Diagnostics (Basel). 2023. PMID: 36980391 Free PMC article.
-
Prediction of periventricular leukomalacia. Part I: Selection of hemodynamic features using logistic regression and decision tree algorithms.Artif Intell Med. 2009 Jul;46(3):201-15. doi: 10.1016/j.artmed.2008.12.005. Epub 2009 Jan 21. Artif Intell Med. 2009. PMID: 19162455 Free PMC article.
-
Selecting dissimilar genes for multi-class classification, an application in cancer subtyping.BMC Bioinformatics. 2007 Jun 16;8:206. doi: 10.1186/1471-2105-8-206. BMC Bioinformatics. 2007. PMID: 17573973 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical