Sample size for binary logistic prediction models: Beyond events per variable criteria

Maarten van Smeden¹, Karel Gm Moons¹, Joris Ah de Groot¹, Gary S Collins², Douglas G Altman², Marinus Jc Eijkemans¹, Johannes B Reitsma¹

Affiliations

¹ 1 Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
² 2 Centre for Statistics in Medicine, Botnar Research Centre, University of Oxford, Oxford, UK.

PMID: 29966490
PMCID: PMC6710621
DOI: 10.1177/0962280218784726

Sample size for binary logistic prediction models: Beyond events per variable criteria

Maarten van Smeden et al. Stat Methods Med Res. 2019 Aug.

. 2019 Aug;28(8):2455-2474.

doi: 10.1177/0962280218784726. Epub 2018 Jul 3.

Authors

Maarten van Smeden¹, Karel Gm Moons¹, Joris Ah de Groot¹, Gary S Collins², Douglas G Altman², Marinus Jc Eijkemans¹, Johannes B Reitsma¹

Affiliations

¹ 1 Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
² 2 Centre for Statistics in Medicine, Botnar Research Centre, University of Oxford, Oxford, UK.

PMID: 29966490
PMCID: PMC6710621
DOI: 10.1177/0962280218784726

Abstract

Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.

Keywords: EPV; Logistic regression; prediction models; predictive performance; sample size; simulations.

PubMed Disclaimer

Figures

**Figure 1.**
Marginal out-of-sample predictive performance.

**Figure 2.**
Boxplot distribution of out-of-sample predictive performance outcomes (restricted to conditions with events fraction = 1/2).

**Figure 3.**
Average relative out-of-sample performances of modeling strategies per simulation factor level.

**Figure 4.**
Relation required sample size and events fraction. Calculations based on metamodels with criterion values that were kept constant. For illustration purposes, the criterion values were chosen such that they would intersect at events fraction = 1/2.

See this image and copyright information in PMC

References

1. Bouwmeester W, Zuithoff NP, Mallett S, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9: e1001221–e1001221. - PMC - PubMed
1. Moons KGM, Altman DG, Reitsma JB, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–W73. - PubMed
1. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015; 162: 55–55. - PubMed
1. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med 2000; 19: 453–473. - PubMed
1. Moons KGM, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98: 691–698. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sample size for binary logistic prediction models: Beyond events per variable criteria

Affiliations

Sample size for binary logistic prediction models: Beyond events per variable criteria

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical