Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 1;22(2):319-340.
doi: 10.1080/15533174.2012.707849.

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Dhruv B Sharma et al. J Comput Graph Stat. .

Abstract

Statistical procedures for variable selection have become integral elements in any analysis. Successful procedures are characterized by high predictive accuracy, yielding interpretable models while retaining computational efficiency. Penalized methods that perform coefficient shrinkage have been shown to be successful in many cases. Models with correlated predictors are particularly challenging to tackle. We propose a penalization procedure that performs variable selection while clustering groups of predictors automatically. The oracle properties of this procedure including consistency in group identification are also studied. The proposed method compares favorably with existing selection approaches in both prediction accuracy and model discovery, while retaining its computational efficiency. Supplemental material are available online.

Keywords: Coefficient shrinkage; Correlation; Group identification; Oracle properties; Penalization; Supervised clustering; Variable selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Graphical representation to represent the flexibility of the PACS approach over the OSCAR approach in the (β1, β2) plane. All figures represent correlation of ρ = 0.85. The top panel has OLS solution βˆOLS = (1,2) while the bottom panel has βˆOLS = (0.1,2) The solution is the first time the contours of the loss function hits the constraint region. (a) When the OLS solutions of (β1, β2) are close to each other, OSCAR sets βˆ1 = βˆ2. (b) When the OLS solutions of (β1, β2) are close to each other, PACS sets βˆ1 = βˆ2. (c) When the OLS solutions of (β1, β2) are not close to each other and the OLS solution of β1 is close to 0, OSCAR sets βˆ1 = βˆ2. (d) When the OLS solutions of (β1,β2) are not close to each other and the OLS solution of β1 is close to 0, PACS sets βˆ1 = 0.
Figure 1
Figure 1
Graphical representation to represent the flexibility of the PACS approach over the OSCAR approach in the (β1, β2) plane. All figures represent correlation of ρ = 0.85. The top panel has OLS solution βˆOLS = (1,2) while the bottom panel has βˆOLS = (0.1,2) The solution is the first time the contours of the loss function hits the constraint region. (a) When the OLS solutions of (β1, β2) are close to each other, OSCAR sets βˆ1 = βˆ2. (b) When the OLS solutions of (β1, β2) are close to each other, PACS sets βˆ1 = βˆ2. (c) When the OLS solutions of (β1, β2) are not close to each other and the OLS solution of β1 is close to 0, OSCAR sets βˆ1 = βˆ2. (d) When the OLS solutions of (β1,β2) are not close to each other and the OLS solution of β1 is close to 0, PACS sets βˆ1 = 0.
Figure 1
Figure 1
Graphical representation to represent the flexibility of the PACS approach over the OSCAR approach in the (β1, β2) plane. All figures represent correlation of ρ = 0.85. The top panel has OLS solution βˆOLS = (1,2) while the bottom panel has βˆOLS = (0.1,2) The solution is the first time the contours of the loss function hits the constraint region. (a) When the OLS solutions of (β1, β2) are close to each other, OSCAR sets βˆ1 = βˆ2. (b) When the OLS solutions of (β1, β2) are close to each other, PACS sets βˆ1 = βˆ2. (c) When the OLS solutions of (β1, β2) are not close to each other and the OLS solution of β1 is close to 0, OSCAR sets βˆ1 = βˆ2. (d) When the OLS solutions of (β1,β2) are not close to each other and the OLS solution of β1 is close to 0, PACS sets βˆ1 = 0.
Figure 1
Figure 1
Graphical representation to represent the flexibility of the PACS approach over the OSCAR approach in the (β1, β2) plane. All figures represent correlation of ρ = 0.85. The top panel has OLS solution βˆOLS = (1,2) while the bottom panel has βˆOLS = (0.1,2) The solution is the first time the contours of the loss function hits the constraint region. (a) When the OLS solutions of (β1, β2) are close to each other, OSCAR sets βˆ1 = βˆ2. (b) When the OLS solutions of (β1, β2) are close to each other, PACS sets βˆ1 = βˆ2. (c) When the OLS solutions of (β1, β2) are not close to each other and the OLS solution of β1 is close to 0, OSCAR sets βˆ1 = βˆ2. (d) When the OLS solutions of (β1,β2) are not close to each other and the OLS solution of β1 is close to 0, PACS sets βˆ1 = 0.

References

    1. Bondell HD, Reich BJ. Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR. Biometrics. 2008;64:115–123. - PMC - PubMed
    1. Bondell HD, Reich BJ. Simultaneous factor selection and collapsing of levels in ANOVA. Biometrics. 2009;65:169–177. - PubMed
    1. Breiman L. Better subset regression using the nonnegative garrote. Technometrics. 1995;37:373–384.
    1. Fan J, Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association. 2001;96:1348–1360.
    1. Hastie T, Tibshirani R, Botstein D, Brown P. Supervised Harvesting of Expression Trees. Genome Biology. 2001;2:1–12. - PMC - PubMed

LinkOut - more resources