Supervised group Lasso with applications to microarray data analysis
- PMID: 17316436
- PMCID: PMC1821041
- DOI: 10.1186/1471-2105-8-60
Supervised group Lasso with applications to microarray data analysis
Abstract
Background: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.
Results: We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.
Conclusion: We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.
Figures


Similar articles
-
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67. BMC Bioinformatics. 2007. PMID: 17328811 Free PMC article.
-
Clustering threshold gradient descent regularization: with applications to microarray studies.Bioinformatics. 2007 Feb 15;23(4):466-72. doi: 10.1093/bioinformatics/btl632. Epub 2006 Dec 20. Bioinformatics. 2007. PMID: 17182700
-
An efficient semi-unsupervised gene selection method via spectral biclustering.IEEE Trans Nanobioscience. 2006 Jun;5(2):110-4. doi: 10.1109/tnb.2006.875040. IEEE Trans Nanobioscience. 2006. PMID: 16805107
-
Relative expression analysis for molecular cancer diagnosis and prognosis.Technol Cancer Res Treat. 2010 Apr;9(2):149-59. doi: 10.1177/153303461000900204. Technol Cancer Res Treat. 2010. PMID: 20218737 Free PMC article. Review.
-
A review of independent component analysis application to microarray gene expression data.Biotechniques. 2008 Nov;45(5):501-20. doi: 10.2144/000112950. Biotechniques. 2008. PMID: 19007336 Free PMC article. Review.
Cited by
-
Similarity of markers identified from cancer gene expression studies: observations from GEO.Brief Bioinform. 2014 Sep;15(5):671-84. doi: 10.1093/bib/bbt044. Epub 2013 Jun 19. Brief Bioinform. 2014. PMID: 23788798 Free PMC article.
-
A Comparison of Rule-based Analysis with Regression Methods in Understanding the Risk Factors for Study Withdrawal in a Pediatric Study.Sci Rep. 2016 Aug 26;6:30828. doi: 10.1038/srep30828. Sci Rep. 2016. PMID: 27561809 Free PMC article.
-
Survival associated pathway identification with group Lp penalized global AUC maximization.Algorithms Mol Biol. 2010 Aug 16;5:30. doi: 10.1186/1748-7188-5-30. Algorithms Mol Biol. 2010. PMID: 20712896 Free PMC article.
-
Explainable AI: A review of applications to neuroimaging data.Front Neurosci. 2022 Dec 1;16:906290. doi: 10.3389/fnins.2022.906290. eCollection 2022. Front Neurosci. 2022. PMID: 36583102 Free PMC article.
-
Data integration by multi-tuning parameter elastic net regression.BMC Bioinformatics. 2018 Oct 10;19(1):369. doi: 10.1186/s12859-018-2401-1. BMC Bioinformatics. 2018. PMID: 30305021 Free PMC article.
References
-
- Dudoit S, Fridyland JF, Speed TP. Comparison of discrimination methods for tumor classification based on microarray data. JASA. 2002;97:77–87.
-
- Rosenwald A, Wright G, Wiestner A, Chan WC, Connors JM, Campo E, Gascoyne RD, Grogan TM, Muller-Hermelink HK, Smeland EB, Chiorazzi M, Giltnane JM, Hurt EM, Zhao H, Averett L, Henrickson S, Yang L, Powell J, Wilson WH, Jaffe ES, Simon R, Klausner RD, Montserrat E, Bosch F, Greiner TC, Weisenburger DD, Sanger WG, Dave BJ, Lynch JC, Vose J, Armitage JO, Fisher RI, Miller TP, LeBlanc M, Ott G, Kvaloy S, Holte H, Delabie J, Staudt LM. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell. 2003;3:185–197. doi: 10.1016/S1535-6108(03)00028-X. - DOI - PubMed
-
- Dave SS, Wright G, Tan B, Rosenwald A, Gascoyne RD, Chan WC, Fisher RI, Braziel RM, Rimsza LM, Grogan TM, Miller TP, LeBlanc M, Greiner TC, Weisenburger DD, Lynch JC, Vose J, Armitage JO, Smeland EB, Kvaloy S, Holte H, Delabie J, Connors JM, Lansdorp PM, Ouyang Q, Lister TA, Davies AJ, Norton AJ, Muller-Hermelink HK, Ott G, Campo E, Montserrat E, Wilson WH, Jaffe ES, Simon R, Yang L, Powell J, Zhao H, Goldschmidt N, Chiorazzi M, Staudt LM. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. The New England Journal of Medicine. 2004;351:2159–2169. doi: 10.1056/NEJMoa041869. - DOI - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous