Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
- PMID: 17608783
- PMCID: PMC2605279
- DOI: 10.1111/j.1541-0420.2007.00843.x
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Abstract
Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information.
Figures





References
-
- Block HW. Continuous multivariate exponential extensions. In: Barlow RE, Fussel JB, Singpurwalla N, editors. Reliability and Failure Tree Analysis. SIAM; Philadelphia: 1975. pp. 285–306.
-
- Dettling M, Bühlmann P. Finding predictive gene groups from microarray data. J. Multivariate Anal. 2004;90:106–131.
-
- Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann. Statist. 2004;32:407–499.
-
- Hoerl AE, Kennard R. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67.
-
- Jörnsten R, Yu B. Simultaneous gene clustering and subset selection for sample classification via MDL. Bioinformatics. 2003;19:1100–1109. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources