Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb 22:8:60.
doi: 10.1186/1471-2105-8-60.

Supervised group Lasso with applications to microarray data analysis

Affiliations

Supervised group Lasso with applications to microarray data analysis

Shuangge Ma et al. BMC Bioinformatics. .

Abstract

Background: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results: We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion: We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gap statistics as a function of number of clusters. Red solid line: Colon data; Green dashed line: Nodal data.
Figure 2
Figure 2
Paths of parameter estimates for Lasso, GLasso and SGLasso. Red lines, cluster 1; Blue lines, cluster 2; Green lines, cluster 3. Solid lines, β1, β4 and β7; Dashed lines, β2, β5, and β8; Dashed-Dotted lines, β3, β6, and β9. The grey lines show the selected tuning parameters. C1, C2 and C3 in the lower-left panel denote clusters 1, 2 and 3, respectively.

Similar articles

Cited by

References

    1. Dudoit S, Fridyland JF, Speed TP. Comparison of discrimination methods for tumor classification based on microarray data. JASA. 2002;97:77–87.
    1. Alon U, Barkai N, Notterman D, Gish K, Mack S, Levine J. Broad Patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS. 1999;96:6745–6750. doi: 10.1073/pnas.96.12.6745. - DOI - PMC - PubMed
    1. Nguyen D, Rocke DM. Partial least squares proportional hazard regression for application to DNA microarray data. Bioinformatics. 2002;18:1625–1632. doi: 10.1093/bioinformatics/18.12.1625. - DOI - PubMed
    1. Rosenwald A, Wright G, Wiestner A, Chan WC, Connors JM, Campo E, Gascoyne RD, Grogan TM, Muller-Hermelink HK, Smeland EB, Chiorazzi M, Giltnane JM, Hurt EM, Zhao H, Averett L, Henrickson S, Yang L, Powell J, Wilson WH, Jaffe ES, Simon R, Klausner RD, Montserrat E, Bosch F, Greiner TC, Weisenburger DD, Sanger WG, Dave BJ, Lynch JC, Vose J, Armitage JO, Fisher RI, Miller TP, LeBlanc M, Ott G, Kvaloy S, Holte H, Delabie J, Staudt LM. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell. 2003;3:185–197. doi: 10.1016/S1535-6108(03)00028-X. - DOI - PubMed
    1. Dave SS, Wright G, Tan B, Rosenwald A, Gascoyne RD, Chan WC, Fisher RI, Braziel RM, Rimsza LM, Grogan TM, Miller TP, LeBlanc M, Greiner TC, Weisenburger DD, Lynch JC, Vose J, Armitage JO, Smeland EB, Kvaloy S, Holte H, Delabie J, Connors JM, Lansdorp PM, Ouyang Q, Lister TA, Davies AJ, Norton AJ, Muller-Hermelink HK, Ott G, Campo E, Montserrat E, Wilson WH, Jaffe ES, Simon R, Yang L, Powell J, Zhao H, Goldschmidt N, Chiorazzi M, Staudt LM. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. The New England Journal of Medicine. 2004;351:2159–2169. doi: 10.1056/NEJMoa041869. - DOI - PubMed

Publication types

LinkOut - more resources