Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar
- PMID: 33148619
- PMCID: PMC7610202
- DOI: 10.1136/bmj.m3919
Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar
Abstract
Objective: To assess the consistency of machine learning and statistical techniques in predicting individual level and population level risks of cardiovascular disease and the effects of censoring on risk predictions.
Design: Longitudinal cohort study from 1 January 1998 to 31 December 2018.
Setting and participants: 3.6 million patients from the Clinical Practice Research Datalink registered at 391 general practices in England with linked hospital admission and mortality records.
Main outcome measures: Model performance including discrimination, calibration, and consistency of individual risk prediction for the same patients among models with comparable model performance. 19 different prediction techniques were applied, including 12 families of machine learning models (grid searched for best models), three Cox proportional hazards models (local fitted, QRISK3, and Framingham), three parametric survival models, and one logistic model.
Results: The various models had similar population level performance (C statistics of about 0.87 and similar calibration). However, the predictions for individual risks of cardiovascular disease varied widely between and within different types of machine learning and statistical models, especially in patients with higher risks. A patient with a risk of 9.5-10.5% predicted by QRISK3 had a risk of 2.9-9.2% in a random forest and 2.4-7.2% in a neural network. The differences in predicted risks between QRISK3 and a neural network ranged between -23.2% and 0.1% (95% range). Models that ignored censoring (that is, assumed censored patients to be event free) substantially underestimated risk of cardiovascular disease. Of the 223 815 patients with a cardiovascular disease risk above 7.5% with QRISK3, 57.8% would be reclassified below 7.5% when using another model.
Conclusions: A variety of models predicted risks for the same patients very differently despite similar model performances. The logistic models and commonly used machine learning models should not be directly applied to the prediction of long term risks without considering censoring. Survival models that consider censoring and that are explainable, such as QRISK3, are preferable. The level of consistency within and between models should be routinely assessed before they are used for clinical decision making.
© Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.
Conflict of interest statement
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support to YL from the China Scholarship Council; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Figures




Similar articles
-
Do population-level risk prediction models that use routinely collected health data reliably predict individual risks?Sci Rep. 2019 Aug 2;9(1):11222. doi: 10.1038/s41598-019-47712-5. Sci Rep. 2019. PMID: 31375726 Free PMC article.
-
Consistency of ranking was evaluated as new measure for prediction model stability: longitudinal cohort study.J Clin Epidemiol. 2021 Oct;138:168-177. doi: 10.1016/j.jclinepi.2021.06.026. Epub 2021 Jul 3. J Clin Epidemiol. 2021. PMID: 34224835
-
Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study.BMJ. 2017 May 23;357:j2099. doi: 10.1136/bmj.j2099. BMJ. 2017. PMID: 28536104 Free PMC article.
-
Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis.BMC Med. 2019 Jun 13;17(1):109. doi: 10.1186/s12916-019-1340-7. BMC Med. 2019. PMID: 31189462 Free PMC article.
-
Machine learning in predicting graft failure following kidney transplantation: A systematic review of published predictive models.Int J Med Inform. 2019 Oct;130:103957. doi: 10.1016/j.ijmedinf.2019.103957. Epub 2019 Aug 24. Int J Med Inform. 2019. PMID: 31472443
Cited by
-
Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approach.Int J Epidemiol. 2022 Jun 13;51(3):931-944. doi: 10.1093/ije/dyab258. Int J Epidemiol. 2022. PMID: 34910160 Free PMC article.
-
A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data.PLoS One. 2023 Sep 8;18(9):e0274276. doi: 10.1371/journal.pone.0274276. eCollection 2023. PLoS One. 2023. PMID: 37682909 Free PMC article.
-
Identification and Validation of an Explainable Prediction Model of Sepsis in Patients With Intracerebral Hemorrhage: Multicenter Retrospective Study.J Med Internet Res. 2025 Apr 28;27:e71413. doi: 10.2196/71413. J Med Internet Res. 2025. PMID: 40293793 Free PMC article.
-
Effect of competing mortality risks on predictive performance of the QRISK3 cardiovascular risk prediction tool in older people and those with comorbidity: external validation population cohort study.Lancet Healthy Longev. 2021 Jun;2(6):e352-e361. doi: 10.1016/S2666-7568(21)00088-X. Lancet Healthy Longev. 2021. PMID: 34100008 Free PMC article.
-
Tailored Bayes: a risk modeling framework under unequal misclassification costs.Biostatistics. 2022 Dec 12;24(1):85-107. doi: 10.1093/biostatistics/kxab023. Biostatistics. 2022. PMID: 34363680 Free PMC article.
References
-
- National Institute for Health and Care Excellence. NICE recommends wider use of statins for prevention of CVD. https://www.nice.org.uk/news/article/nice-recommends-wider-use-of-statin....
-
- Gov.uk. Health Secretary announces £250 million investment in artificial intelligence. 2019. https://www.gov.uk/government/news/health-secretary-announces-250-millio....
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources