Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 May;60(3):431-449.
doi: 10.1002/bimj.201700067. Epub 2018 Jan 2.

Variable selection - A review and recommendations for the practicing statistician

Affiliations
Review

Variable selection - A review and recommendations for the practicing statistician

Georg Heinze et al. Biom J. 2018 May.

Abstract

Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well-established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10-30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change-in-estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p-values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low-dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.

Keywords: change-in-estimate criterion; penalized likelihood; resampling; statistical model; stepwise selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation study to illustrate possible differential effects of variable selection. Graphs show scatterplots of estimated regression coefficients β^1 and β^2 in 50 simulated datasets of size N=50 with two standard normal IVs with correlation ρ=0.5. Circles and dots indicate simulated datasets where a test of the null hypothesis β2=0 yields p‐values greater or lower than 0.157, respectively. The dashed lines are regression lines of β1 on β2; thus they indicate how β1 would change if β2 is set to 0
Figure 2
Figure 2
A schematic network of dependencies arising from variable selection. β, regression coefficient; IV, independent variable; RMSE, root mean squared error

Comment in

References

    1. Akaike, H. (1973). Formation theory and an extension of the maximum likelihood principle In Petrov B. N. & Csaki F. (Eds.), Second international symposium on information theory (pp. 267–281). Budapest, HU: Akadémiai Kiado.
    1. Altman, D. , McShane, L. , Sauerbrei, W. , & Taube, S. E. (2012). Reporting recommendations for tumor marker prognostic studies (REMARK): Explanation and elaboration. PLoS Medicine, 9(5), e1001216. - PMC - PubMed
    1. Andersen, P. K. , & Skovgaard, L. T. (2010). Regression with linear predictors. New York, NY: Springer.
    1. AZQuotes.com. (2017a). Retrieved from https://www.azquotes.com/quote/1458996 [accessed 06 February 2017].
    1. AZQuotes.com. (2017b). Retrieved from https://www.azquotes.com/quote/303076 [accessed 11 April 2017].

LinkOut - more resources