Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug;28(8):2455-2474.
doi: 10.1177/0962280218784726. Epub 2018 Jul 3.

Sample size for binary logistic prediction models: Beyond events per variable criteria

Affiliations

Sample size for binary logistic prediction models: Beyond events per variable criteria

Maarten van Smeden et al. Stat Methods Med Res. 2019 Aug.

Abstract

Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.

Keywords: EPV; Logistic regression; prediction models; predictive performance; sample size; simulations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Marginal out-of-sample predictive performance.
Figure 2.
Figure 2.
Boxplot distribution of out-of-sample predictive performance outcomes (restricted to conditions with events fraction = 1/2).
Figure 3.
Figure 3.
Average relative out-of-sample performances of modeling strategies per simulation factor level.
Figure 4.
Figure 4.
Relation required sample size and events fraction. Calculations based on metamodels with criterion values that were kept constant. For illustration purposes, the criterion values were chosen such that they would intersect at events fraction = 1/2.

References

    1. Bouwmeester W, Zuithoff NP, Mallett S, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9: e1001221–e1001221. - PMC - PubMed
    1. Moons KGM, Altman DG, Reitsma JB, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–W73. - PubMed
    1. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015; 162: 55–55. - PubMed
    1. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med 2000; 19: 453–473. - PubMed
    1. Moons KGM, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98: 691–698. - PubMed

Publication types