Assessing the quality of prediction models in health care using the Prediction model Risk Of Bias ASsessment Tool (PROBAST): an evaluation of its use and practical application
- PMID: 40010583
- DOI: 10.1016/j.jclinepi.2025.111732
Assessing the quality of prediction models in health care using the Prediction model Risk Of Bias ASsessment Tool (PROBAST): an evaluation of its use and practical application
Abstract
Background and objectives: Since 2019, the Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) has supported methodological quality assessments of prediction model studies. Most prediction model studies are rated with a "High" risk of bias (ROB) and researchers report low interrater reliability (IRR) using PROBAST. We aimed to (1) assess the IRR of PROBAST ratings between assessors of the same study and understand reasons for discrepancies, (2) determine which items contribute most to domain-level ROB ratings, and (3) explore the impact of consensus meetings.
Study design and setting: We used PROBAST assessments from a systematic review of diagnostic and prognostic COVID-19 prediction models as a case study. Assessors included international experts in prediction model studies or their reviews. We assessed IRR using prevalence-adjusted bias-adjusted kappa (PABAK) before consensus meetings, examined bias ratings per domain-level ROB judgments, and evaluated the impact of consensus meetings by identifying rating changes after discussion.
Results: We analyzed 2167 PROBAST assessments from 27 assessor pairs covering 760 prediction models: 384 developments, 242 validations, and 134 mixed assessments (including both). The IRR using PABAK was higher for overall ROB judgments (development: 0.82 [0.76; 0.89]; validation: 0.78 [0.68; 0.88]) compared to domain- and item-level judgments. Some PROBAST items frequently contributed to domain-level ROB judgments, eg, 3.5 Outcome blinding and 4.1 Sample size. Consensus discussions mainly led to item-level and never to overall ROB rating changes.
Conclusion: Within this case study, PROBAST assessments received high IRR at the overall ROB level, with some variation at item- and domain-level. To reduce variability, PROBAST assessors should standardize item- and domain-level judgments and hold well-structured consensus meetings between assessors of the same study.
Plain language summary: The Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) provides a set of items to assess the quality of medical studies on so-called prediction tools that calculate an individual's probability of having or developing a certain disease or health outcome. Previous research found low interrater reliability (IRR; ie, how consistently two assessors rate aspects of the same study) when using PROBAST. To understand why this is the case, we conducted a large study involving more than 30 experts from around the world, all of whom applied PROBAST to the same set of prediction tool studies. Based on more than 2150 PROBAST assessments, we identified which PROBAST items led to the most disagreements between raters, explored reasons for these disagreements, and examined whether the use of so-called consensus meetings (ie, different assessors of the same study discuss their ratings and decide on a finalized rating) impacted PROBAST ratings. Our study found that the IRR between different assessors of the same study was higher than previously reported. One explanation for the better agreement compared to previous research may be the preplanning on how to assess certain PROBAST aspects before starting the assessments, as well as holding well-structured consensus meetings. These improvements lead to a more effective use of PROBAST in evaluating the trustworthiness and quality of prediction tools in the health-care domain.
Keywords: Interrater reliability; Methodology; PROBAST; Prediction models; Risk of bias assessments; Systematic reviews.
Copyright © 2025 The Author(s). Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of competing interest K.G.M.M. was involved in the development of the quality assessment tool included and described in this study. L.W., B.V.C., K.G.M.M., L.H., M.v.S., and J.A.A.G. were involved in the quality assessment ratings included and described in our study. There are no competing interests for any other author.
Similar articles
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
Inter-Rater Agreement in Assessing Risk of Bias in Melanoma Prediction Studies Using the Prediction Model Risk of Bias Assessment Tool (PROBAST): Results from a Controlled Experiment on the Effect of Specific Rater Training.J Clin Med. 2023 Mar 2;12(5):1976. doi: 10.3390/jcm12051976. J Clin Med. 2023. PMID: 36902763 Free PMC article.
-
Assessor burden, inter-rater agreement and user experience of the RoB-SPEO tool for assessing risk of bias in studies estimating prevalence of exposure to occupational risk factors: An analysis from the WHO/ILO Joint Estimates of the Work-related Burden of Disease and Injury.Environ Int. 2022 Jan;158:107005. doi: 10.1016/j.envint.2021.107005. Epub 2021 Nov 30. Environ Int. 2022. PMID: 34991265 Free PMC article.
-
Common challenges and suggestions for risk of bias tool development: a systematic review of methodological studies.J Clin Epidemiol. 2024 Jul;171:111370. doi: 10.1016/j.jclinepi.2024.111370. Epub 2024 Apr 24. J Clin Epidemiol. 2024. PMID: 38670243
-
Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement.J Clin Epidemiol. 2023 Jul;159:159-173. doi: 10.1016/j.jclinepi.2023.04.012. Epub 2023 May 2. J Clin Epidemiol. 2023. PMID: 37142166 Review.
Cited by
-
PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods.BMJ. 2025 Mar 24;388:e082505. doi: 10.1136/bmj-2024-082505. BMJ. 2025. PMID: 40127903 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical