Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar;4(6):111.
doi: 10.21037/atm.2016.02.15.

Model building strategy for logistic regression: purposeful selection

Affiliations

Model building strategy for logistic regression: purposeful selection

Zhongheng Zhang. Ann Transl Med. 2016 Mar.

Abstract

Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.

Keywords: Hosmer-Lemeshow; Logistic regression; R; interaction; linearity; purposeful selection.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The author has no conflicts of interest to declare.

Figures

None
Zhongheng Zhang, MMed.
Figure 1
Figure 1
Smoothed scatter plots showing the relationship between variable of interest with mortality outcome in logit scale.
Figure 2
Figure 2
Effect of hb on the probability of mortality, stratified by different age groups.
Figure 3
Figure 3
The plot of jittered outcome (alive=1; die=2) versus estimated probability of death from fitted model.
Figure 4
Figure 4
Histogram of estimated probability of death, stratified by observed outcome.
Figure 5
Figure 5
The receiver operating characteristic curve (ROC) reflecting the discrimination power of the model.

References

    1. Bursac Z, Gauss CH, Williams DK, et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 2008;3:17. 10.1186/1751-0473-3-17 - DOI - PMC - PubMed
    1. Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health 1989;79:340-9. 10.2105/AJPH.79.3.340 - DOI - PMC - PubMed
    1. Model-building strategies and methods for logistic regression. In: Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2000;63.
    1. Zhang Z, Chen K, Ni H, et al. Predictive value of lactate in unselected critically ill patients: an analysis using fractional polynomials. J Thorac Dis 2014;6:995-1003. - PMC - PubMed
    1. Zhang Z, Ni H. Normalized lactate load is associated with development of acute kidney injury in patients who underwent cardiopulmonary bypass surgery. PLoS One 2015;10:e0120466. 10.1371/journal.pone.0120466 - DOI - PMC - PubMed

LinkOut - more resources