. 2010 Jan;20(1):101-148.

A Selective Overview of Variable Selection in High Dimensional Feature Space

Jianqing Fan¹, Jinchi Lv

Affiliations

Affiliation

¹ Frederick L. Moore '18 Professor of Finance, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA ( jqfan@princeton.edu ).

PMID: 21572976
PMCID: PMC3092303

A Selective Overview of Variable Selection in High Dimensional Feature Space

Jianqing Fan et al. Stat Sin. 2010 Jan.

. 2010 Jan;20(1):101-148.

Authors

Jianqing Fan¹, Jinchi Lv

Affiliation

¹ Frederick L. Moore '18 Professor of Finance, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA ( jqfan@princeton.edu ).

PMID: 21572976
PMCID: PMC3092303

Abstract

High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.

PubMed Disclaimer

Figures

**Figure 1**
Distributions (left panel) of the maximum absolute sample correlation coefficient max_2≤j≤p |corr(Z₁,*Z_j*)|, and distributions (right panel) of the maximum absolute multiple correlation coefficient of Z₁ with 5 other variables ( ${max}_{| S | = 5} | corr (Z_{1}, Z_{S}^{T} {\hat{β}}_{S}) |$ , where β̂_S is the regression coefficient of Z₁ regressed on Z_S, a subset of variables indexed by S and excluding Z₁), computed by the stepwise addition algorithm (the actual values are larger than what are presented here), when n = 50, p = 1000 (solid curve) and p = 10000 (dashed), based on 1000 simulations.

**Figure 2**
Some commonly used penalty functions (left panel) and their derivatives (right panel). They correspond to the risk functions shown in the right panel of Figure 3. More precisely, λ = 2 for hard thresholding penalty, λ = 1.04 for L₁-penalty, λ = 1.02 for SCAD with a = 3.7, and λ = 1.49 for MCP with a = 2.

**Figure 3**
The risk functions for penalized least squares under the Gaussian model for the hard-thresholding penalty, L₁-penalty, SCAD (a = 3.7), and MCP (a = 2). The left panel corresponds to λ = 1 and the right panel corresponds to λ = 2 for the hard-thresholding estimator, and the rest of parameters are chosen so that their risks are the same at the point θ = 3.

**Figure 4**
The local linear (dashed) and local quadratic (dotted) approximations to the SCAD function (solid) with λ = 2 and a = 3.7 at a given point |θ| = 4.

**Figure 5**
Illustration of ultra-high dimensional variable selection scheme. A large scale screening is first used to screen out unimportant variables and then a moderate-scale searching is applied to further select important variables. At both steps, one can choose a favorite method.

See this image and copyright information in PMC

References

1. Abramovich F, Benjamini Y, Donoho D, Johnstone I. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 2006;34:584–653.
1. Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. Second International Symposium on Information Theory. Budapest: Akademiai Kiado; 1973. pp. 267–281.
1. Akaike H. A new look at the statistical model identification. IEEE Trans. Auto. Control. 1974;19:716–723.
1. Antoniadis A. Smoothing noisy data with tapered coiflets series. Scand. J. Statist. 1996;23:313–330.
1. Antoniadis A, Fan J. Regularization of wavelets approximations (with discussion) J. Amer. Statist. Assoc. 2001;96:939–967.

Grants and funding

R01 GM072611/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Selective Overview of Variable Selection in High Dimensional Feature Space

Affiliation

A Selective Overview of Variable Selection in High Dimensional Feature Space

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources