COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION

Patrick Breheny¹, Jian Huang

Affiliations

Affiliation

¹ Department of Biostatistics, Department of Statistics, University of Kentucky, 121 Washington Ave., Room 203C, Lexington, Kentucky 40536-0003, USA. Department of Statistics, and Actuarial Sciences, Department of Biostatistics, University of Iowa, 241 Schaeffer Hall, Iowa City, Iowa 52242, USA.

PMID: 22081779
PMCID: PMC3212875
DOI: 10.1214/10-AOAS388

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION

Patrick Breheny et al. Ann Appl Stat. 2011.

. 2011 Jan 1;5(1):232-253.

doi: 10.1214/10-AOAS388.

Authors

Patrick Breheny¹, Jian Huang

Affiliation

¹ Department of Biostatistics, Department of Statistics, University of Kentucky, 121 Washington Ave., Room 203C, Lexington, Kentucky 40536-0003, USA. Department of Statistics, and Actuarial Sciences, Department of Biostatistics, University of Iowa, 241 Schaeffer Hall, Iowa City, Iowa 52242, USA.

PMID: 22081779
PMCID: PMC3212875
DOI: 10.1214/10-AOAS388

Abstract

A number of variable selection methods have been proposed involving nonconvex penalty functions. These methods, which include the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP), have been demonstrated to have attractive theoretical properties, but model fitting is not a straightforward task, and the resulting solutions may be unstable. Here, we demonstrate the potential of coordinate descent algorithms for fitting these models, establishing theoretical convergence properties and demonstrating that they are significantly faster than competing approaches. In addition, we demonstrate the utility of convexity diagnostics to determine regions of the parameter space in which the objective function is locally convex, even though the penalty is not. Our simulation study and data examples indicate that nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso in many applications. In particular, our numerical results suggest that MCP is the preferred approach among the three methods.

PubMed Disclaimer

Figures

**Fig. 1**
Shapes of the lasso, SCAD and MCP penalty functions. The panel on the left plots the penalties themselves, whereas the panel on the right plots the derivative of the penalty. Note that none of the penalties are differentiable at β_j = 0.

**Fig. 2**
Example MCP coefficient paths for simulated data where p > n. The shaded region is the region in which the objective function is not locally convex. Note that the solutions are continuous and stable in the locally convex regions, but discontinuous and erratic otherwise.

**Fig. 3**
Time required (in seconds) to fit the entire coefficient paths for linear and logistic regression, employing either the local linear approximation (LLA) algorithm or the coordinate descent (CD) algorithm. Both axes are on the log scale. Times displayed are averaged over 100 independently generated data sets. The coordinate descent algorithm is at least 100 times faster than the LLA algorithm at all points.

**Fig. 4**
Relative (to the lasso) mean squared error for MCP- and SCAD-penalized linear and logistic regression. MSE was calculated for each penalty on 100 independently generated data sets, and the ratio of the medians is plotted. MCP and SCAD greatly outperform lasso for large coefficients, but not necessarily for small coefficients.

**Fig. 5**
MCP coefficient paths for various values of γ for the two studies of Section 6. The shaded region depicts areas that are not locally convex, and a vertical line is drawn at the value of λ selected by BIC. For the sparse genetic association data, small values of γ produce the best fit; for the dense microarray data, large values are preferred.

See this image and copyright information in PMC

References

1. Breiman L. Heuristics of instability and stabilization in model selection. Ann Statist. 1996;24:2350–2383.
1. Bruce AG, Gao HY. Understanding WaveShrink: Variance and bias estimation. Biometrika. 1996;83:727–745.
1. Donoho DL, Johnstone JM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–455.
1. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist. 2004;32:407–451.
1. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc. 2001;96:1348–1361.

Grants and funding

R01 CA120988/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION

Affiliation

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous