Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan 1;5(1):232-253.
doi: 10.1214/10-AOAS388.

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION

Affiliations

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION

Patrick Breheny et al. Ann Appl Stat. .

Abstract

A number of variable selection methods have been proposed involving nonconvex penalty functions. These methods, which include the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP), have been demonstrated to have attractive theoretical properties, but model fitting is not a straightforward task, and the resulting solutions may be unstable. Here, we demonstrate the potential of coordinate descent algorithms for fitting these models, establishing theoretical convergence properties and demonstrating that they are significantly faster than competing approaches. In addition, we demonstrate the utility of convexity diagnostics to determine regions of the parameter space in which the objective function is locally convex, even though the penalty is not. Our simulation study and data examples indicate that nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso in many applications. In particular, our numerical results suggest that MCP is the preferred approach among the three methods.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Shapes of the lasso, SCAD and MCP penalty functions. The panel on the left plots the penalties themselves, whereas the panel on the right plots the derivative of the penalty. Note that none of the penalties are differentiable at βj = 0.
Fig. 2
Fig. 2
Example MCP coefficient paths for simulated data where p > n. The shaded region is the region in which the objective function is not locally convex. Note that the solutions are continuous and stable in the locally convex regions, but discontinuous and erratic otherwise.
Fig. 3
Fig. 3
Time required (in seconds) to fit the entire coefficient paths for linear and logistic regression, employing either the local linear approximation (LLA) algorithm or the coordinate descent (CD) algorithm. Both axes are on the log scale. Times displayed are averaged over 100 independently generated data sets. The coordinate descent algorithm is at least 100 times faster than the LLA algorithm at all points.
Fig. 4
Fig. 4
Relative (to the lasso) mean squared error for MCP- and SCAD-penalized linear and logistic regression. MSE was calculated for each penalty on 100 independently generated data sets, and the ratio of the medians is plotted. MCP and SCAD greatly outperform lasso for large coefficients, but not necessarily for small coefficients.
Fig. 5
Fig. 5
MCP coefficient paths for various values of γ for the two studies of Section 6. The shaded region depicts areas that are not locally convex, and a vertical line is drawn at the value of λ selected by BIC. For the sparse genetic association data, small values of γ produce the best fit; for the dense microarray data, large values are preferred.

References

    1. Breiman L. Heuristics of instability and stabilization in model selection. Ann Statist. 1996;24:2350–2383.
    1. Bruce AG, Gao HY. Understanding WaveShrink: Variance and bias estimation. Biometrika. 1996;83:727–745.
    1. Donoho DL, Johnstone JM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–455.
    1. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist. 2004;32:407–451.
    1. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc. 2001;96:1348–1361.

LinkOut - more resources