. 2019 Mar 30;38(7):1276-1296.

doi: 10.1002/sim.7992. Epub 2018 Oct 24.

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

Richard D Riley¹, Kym Ie Snell¹, Joie Ensor¹, Danielle L Burke¹, Frank E Harrell Jr², Karel Gm Moons³, Gary S Collins⁴

Affiliations

¹ Centre for Prognosis Research, Research Institute for Primary Care and Health Sciences, Keele University, Staffordshire, UK.
² Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee.
³ Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands.
⁴ Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

PMID: 30357870
PMCID: PMC6519266
DOI: 10.1002/sim.7992

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

Richard D Riley et al. Stat Med. 2019.

. 2019 Mar 30;38(7):1276-1296.

doi: 10.1002/sim.7992. Epub 2018 Oct 24.

Authors

Richard D Riley¹, Kym Ie Snell¹, Joie Ensor¹, Danielle L Burke¹, Frank E Harrell Jr², Karel Gm Moons³, Gary S Collins⁴

Affiliations

¹ Centre for Prognosis Research, Research Institute for Primary Care and Health Sciences, Keele University, Staffordshire, UK.
² Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee.
³ Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands.
⁴ Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

PMID: 30357870
PMCID: PMC6519266
DOI: 10.1002/sim.7992

Erratum in

Correction to: Minimum sample size for developing a multivariable prediction model: Part II-binary and time-to-event outcomes by Riley RD, Snell KI, Ensor J, et al.
Riley RD. Riley RD. Stat Med. 2019 Dec 30;38(30):5672. doi: 10.1002/sim.8409. Epub 2019 Oct 29. Stat Med. 2019. PMID: 31793031 Free PMC article. No abstract available.

Abstract

When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, (ii) small absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke's R² , and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R² , which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.

Keywords: binary and time-to-event outcomes; logistic and Cox regression; multivariable prediction model; pseudo R-squared; sample size; shrinkage.

PubMed Disclaimer

Figures

**Figure 1**
Summary of the steps involved in calculating the minimum sample size required for developing a multivariable prediction model for binary or time‐to‐event outcomes

**Figure 2**
Events per predictor parameter required to achieve various expected shrinkage (S_VH) values for a new prediction model of venous thromboembolism recurrence risk with an assumed $R_{CS_adj}^{2}$ of 0.051 [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 3**
Sample size required (based on Equation (11)) for a particular number of predictor parameters (p) to achieve a particular value of expected shrinkage (S_VH), for a new prediction model of venous thromboembolism recurrence risk with an assumed $R_{CS_adj}^{2}$ of 0.051 [Colour figure can be viewed at wileyonlinelibrary.com]

See this image and copyright information in PMC

References

1. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York, NY: Springer Science+Business Media; 2009.
1. Royston P, Moons KG, Altman DG, Vergouwe Y. Prognosis and prognostic research: developing a prognostic model. Br Med J. 2009;338:1373‐1377.10.1136/bmj.b604 - DOI - PubMed
1. Steyerberg EW, Moons KG, van der Windt DA, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381. - PMC - PubMed
1. Wells PS, Anderson DR, Rodger M, et al. Derivation of a simple clinical model to categorize patients probability of pulmonary embolism: increasing the models utility with the SimpliRED D‐dimer. Thromb Haemost. 2000;83(3):416‐420. - PubMed
1. Wells PS, Anderson DR, Bormanis J, et al. Value of assessment of pretest probability of deep‐vein thrombosis in clinical management. Lancet. 1997;350(9094):1795‐1798. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

Affiliations

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

Authors

Affiliations

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical