Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 16:3:17.
doi: 10.1186/1751-0473-3-17.

Purposeful selection of variables in logistic regression

Affiliations

Purposeful selection of variables in logistic regression

Zoran Bursac et al. Source Code Biol Med. .

Abstract

Background: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process.

Methods: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE.

Results: We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data.

Conclusion: If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.

PubMed Disclaimer

Figures

Figure 1
Figure 1
%PurposefulSelection macro flow chart.

References

    1. Hosmer DW, Lemeshow S. Applied Logistic Regression. New York: Wiley; 2000.
    1. Hosmer DW, Lemeshow S. Applied Survival Analysis: Regression Modeling of Time to Event Data. New York: Wiley; 1999.
    1. Russell SJ, Norvig P. Artificial Intelligence: A Modern Approach. New Jersey: Prentice Hall; 2003.
    1. Gutin G, Yeo A, Zverovich A. Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the TSP. Discrete Applied Mathematics. 2002;117:81–86. doi: 10.1016/S0166-218X(01)00195-0. - DOI
    1. Bang-Jensen J, Gutin G, Yeo A. When the greedy algorithm fails. Discrete Optimization. 2004;1:121–127. doi: 10.1016/j.disopt.2004.03.007. - DOI