Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;66(1):e2200209.
doi: 10.1002/bimj.202200209. Epub 2023 Aug 29.

Variable selection in linear regression models: Choosing the best subset is not always the best choice

Affiliations

Variable selection in linear regression models: Choosing the best subset is not always the best choice

Moritz Hanke et al. Biom J. 2024 Jan.

Abstract

We consider the question of variable selection in linear regressions, in the sense of identifying the correct direct predictors (those variables that have nonzero coefficients given all candidate predictors). Best subset selection (BSS) is often considered the "gold standard," with its use being restricted only by its NP-hard nature. Alternatives such as the least absolute shrinkage and selection operator (Lasso) or the Elastic net (Enet) have become methods of choice in high-dimensional settings. A recent proposal represents BSS as a mixed-integer optimization problem so that large problems have become computationally feasible. We present an extensive neutral comparison assessing the ability to select the correct direct predictors of BSS compared to forward stepwise selection (FSS), Lasso, and Enet. The simulation considers a range of settings that are challenging regarding dimensionality (number of observations and variables), signal-to-noise ratios, and correlations between predictors. As fair measure of performance, we primarily used the best possible F1-score for each method, and results were confirmed by alternative performance measures and practical criteria for choosing the tuning parameters and subset sizes. Surprisingly, it was only in settings where the signal-to-noise ratio was high and the variables were uncorrelated that BSS reliably outperformed the other methods, even in low-dimensional settings. Furthermore, FSS performed almost identically to BSS. Our results shed new light on the usual presumption of BSS being, in principle, the best choice for selecting the correct direct predictors. Especially for correlated variables, alternatives like Enet are faster and appear to perform better in practical settings.

Keywords: Lasso; best subset selection; linear regression; mixed-integer optimization; variable selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

References

REFERENCES

    1. Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In E. Parzen, K. Tanabe, & G. Kitagawa (Eds.), Selected papers of Hirotugu Akaike, (Vol. 1, 1st ed., pp. 199-213). Springer New York.
    1. Alaíz, C. M., Barbero, Á., & Dorronsoro, J. R. (2013). Group fused lasso. In V. Mladenov, P. Koprinkova-Hristova, G. Palm, A. E. P. Villa, B. Appollini, & N. Kasabov (Eds.), Artificial neural networks and machine learning - ICANN 2013 (pp. 66-73). Springer Berlin Heidelberg.
    1. Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40-79.
    1. Atyeo, C., Pullen, K. M., Bordt, E. A., Fischinger, S., Burke, J., Michell, A., Slein, M. D., Loos, C., Shook, L. L., Boatin, A. A., Yockey, L. J., Pepin, D., Meinsohn, M.-C., Nguyen, N. M. P., Chauvin, M., Roberts, D., Goldfarb, I. T., Matute, J. D., James, K. E., … Alter, G. (2021). Compromised SARS-CoV-2-specific placental antibody transfer. Cell, 184(3), 628-642.
    1. Barron, A., Birgé, L., & Massart, P. (1999). Risk bounds for model selection via penalization. Probability Theory and Related Fields, 113(3), 301-413.

Publication types

LinkOut - more resources