Internal validation of predictive models: efficiency of some procedures for logistic regression analysis
- PMID: 11470385
- DOI: 10.1016/s0895-4356(01)00341-9
Internal validation of predictive models: efficiency of some procedures for logistic regression analysis
Abstract
The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between n = 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2851 deaths) to reflect modeling in data sets with between 5 and 80 events per variable. Independent performance was determined on the remaining subjects. Performance measures included discriminative ability, calibration and overall accuracy. We found that split-sample analyses gave overly pessimistic estimates of performance, with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable for all performance measures. Internal validity could best be estimated with bootstrapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for estimation of internal validity of a predictive logistic regression model.
Similar articles
-
Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models.Stat Methods Med Res. 2017 Apr;26(2):796-808. doi: 10.1177/0962280214558972. Epub 2014 Nov 19. Stat Methods Med Res. 2017. PMID: 25411322 Free PMC article.
-
Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis.J Clin Epidemiol. 1999 Oct;52(10):935-42. doi: 10.1016/s0895-4356(99)00103-1. J Clin Epidemiol. 1999. PMID: 10513756
-
Internal and external validation of predictive models: a simulation study of bias and precision in small samples.J Clin Epidemiol. 2003 May;56(5):441-7. doi: 10.1016/s0895-4356(03)00047-7. J Clin Epidemiol. 2003. PMID: 12812818
-
Establishment of Best Practices for Evidence for Prediction: A Review.JAMA Psychiatry. 2020 May 1;77(5):534-540. doi: 10.1001/jamapsychiatry.2019.3671. JAMA Psychiatry. 2020. PMID: 31774490 Free PMC article. Review.
-
Don't lose samples to estimation.Patterns (N Y). 2022 Dec 9;3(12):100612. doi: 10.1016/j.patter.2022.100612. eCollection 2022 Dec 9. Patterns (N Y). 2022. PMID: 36569551 Free PMC article. Review.
Cited by
-
Fatigue as prognostic risk marker of mental sickness absence in white collar employees.J Occup Rehabil. 2014 Jun;24(2):307-15. doi: 10.1007/s10926-013-9458-5. J Occup Rehabil. 2014. PMID: 23821309
-
Developing clinical prediction models: a step-by-step guide.BMJ. 2024 Sep 3;386:e078276. doi: 10.1136/bmj-2023-078276. BMJ. 2024. PMID: 39227063 Free PMC article.
-
Hamsi scoring in the prediction of unfavorable outcomes from tuberculous meningitis: results of Haydarpasa-II study.J Neurol. 2015;262(4):890-8. doi: 10.1007/s00415-015-7651-5. Epub 2015 Jan 30. J Neurol. 2015. PMID: 25634680
-
Respiratory rate and pulse oximetry derived information as predictors of hospital admission in young children in Bangladesh: a prospective observational study.BMJ Open. 2016 Aug 17;6(8):e011094. doi: 10.1136/bmjopen-2016-011094. BMJ Open. 2016. PMID: 27534987 Free PMC article.
-
Development and internal validation of a prediction tool to aid the diagnosis of Cushing's syndrome in dogs attending primary-care practice.J Vet Intern Med. 2020 Nov;34(6):2306-2318. doi: 10.1111/jvim.15851. Epub 2020 Sep 16. J Vet Intern Med. 2020. PMID: 32935905 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical