Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes
- PMID: 30357870
- PMCID: PMC6519266
- DOI: 10.1002/sim.7992
Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes
Erratum in
-
Correction to: Minimum sample size for developing a multivariable prediction model: Part II-binary and time-to-event outcomes by Riley RD, Snell KI, Ensor J, et al.Stat Med. 2019 Dec 30;38(30):5672. doi: 10.1002/sim.8409. Epub 2019 Oct 29. Stat Med. 2019. PMID: 31793031 Free PMC article. No abstract available.
Abstract
When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, (ii) small absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke's R2 , and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R2 , which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.
Keywords: binary and time-to-event outcomes; logistic and Cox regression; multivariable prediction model; pseudo R-squared; sample size; shrinkage.
© 2018 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Figures



Similar articles
-
Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes.Stat Med. 2019 Mar 30;38(7):1262-1275. doi: 10.1002/sim.7993. Epub 2018 Oct 22. Stat Med. 2019. PMID: 30347470
-
Minimum sample size for developing a multivariable prediction model using multinomial logistic regression.Stat Methods Med Res. 2023 Mar;32(3):555-571. doi: 10.1177/09622802231151220. Epub 2023 Jan 19. Stat Methods Med Res. 2023. PMID: 36660777 Free PMC article.
-
Minimum sample size calculations for external validation of a clinical prediction model with a time-to-event outcome.Stat Med. 2022 Mar 30;41(7):1280-1295. doi: 10.1002/sim.9275. Epub 2021 Dec 16. Stat Med. 2022. PMID: 34915593
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
Sample sizes of prediction model studies in prostate cancer were rarely justified and often insufficient.J Clin Epidemiol. 2021 May;133:53-60. doi: 10.1016/j.jclinepi.2020.12.011. Epub 2020 Dec 28. J Clin Epidemiol. 2021. PMID: 33383128
Cited by
-
Development of Predictive Models for Survival among Women with Breast Cancer in Malaysia.Int J Environ Res Public Health. 2022 Nov 20;19(22):15335. doi: 10.3390/ijerph192215335. Int J Environ Res Public Health. 2022. PMID: 36430052 Free PMC article.
-
Introducing MCC-PS: a novel prognostic score for Merkel cell carcinoma.Front Oncol. 2024 Jul 22;14:1427740. doi: 10.3389/fonc.2024.1427740. eCollection 2024. Front Oncol. 2024. PMID: 39104722 Free PMC article.
-
Systematic review of clinical prediction models for psychosis in individuals meeting At Risk Mental State criteria.Front Psychiatry. 2024 Oct 2;15:1408738. doi: 10.3389/fpsyt.2024.1408738. eCollection 2024. Front Psychiatry. 2024. PMID: 39415891 Free PMC article.
-
Predicting incident heart failure from population-based nationwide electronic health records: protocol for a model development and validation study.BMJ Open. 2024 Jan 22;14(1):e073455. doi: 10.1136/bmjopen-2023-073455. BMJ Open. 2024. PMID: 38253453 Free PMC article.
-
Updates on Quantitative MRI of Diffuse Liver Disease: A Narrative Review.Biomed Res Int. 2022 Dec 28;2022:1147111. doi: 10.1155/2022/1147111. eCollection 2022. Biomed Res Int. 2022. PMID: 36619303 Free PMC article. Review.
References
-
- Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York, NY: Springer Science+Business Media; 2009.
-
- Wells PS, Anderson DR, Rodger M, et al. Derivation of a simple clinical model to categorize patients probability of pulmonary embolism: increasing the models utility with the SimpliRED D‐dimer. Thromb Haemost. 2000;83(3):416‐420. - PubMed
-
- Wells PS, Anderson DR, Bormanis J, et al. Value of assessment of pretest probability of deep‐vein thrombosis in clinical management. Lancet. 1997;350(9094):1795‐1798. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical