Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials
- PMID: 32065732
- DOI: 10.1002/jrsm.1398
Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials
Abstract
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi-autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers.
Methods: Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers.
Results: The study included 372 RCTs. Inter-rater reliability ranged from κ = -0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of "irrelevant supporting quotations" to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%).
Conclusion: RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.
Keywords: artificial intelligence; health technology assessment (HTA); inter-rater reliability; randomized controlled trial; risk of bias; systematic review.
© 2020 John Wiley & Sons, Ltd.
Similar articles
-
Agreement in Risk of Bias Assessment Between RobotReviewer and Human Reviewers: An Evaluation Study on Randomised Controlled Trials in Nursing-Related Cochrane Reviews.J Nurs Scholarsh. 2021 Mar;53(2):246-254. doi: 10.1111/jnu.12628. Epub 2021 Feb 8. J Nurs Scholarsh. 2021. PMID: 33555110
-
Towards the automatic risk of bias assessment on randomized controlled trials: A comparison of RobotReviewer and humans.Res Synth Methods. 2024 Nov;15(6):1111-1119. doi: 10.1002/jrsm.1761. Epub 2024 Sep 26. Res Synth Methods. 2024. PMID: 39327803
-
Accuracy and Efficiency of Machine Learning-Assisted Risk-of-Bias Assessments in "Real-World" Systematic Reviews : A Noninferiority Randomized Controlled Trial.Ann Intern Med. 2022 Jul;175(7):1001-1009. doi: 10.7326/M22-0092. Epub 2022 May 31. Ann Intern Med. 2022. PMID: 35635850 Clinical Trial.
-
Validity and Inter-Rater Reliability Testing of Quality Assessment Instruments [Internet].Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Mar. Report No.: 12-EHC039-EF. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Mar. Report No.: 12-EHC039-EF. PMID: 22536612 Free Books & Documents. Review.
-
Assessor burden, inter-rater agreement and user experience of the RoB-SPEO tool for assessing risk of bias in studies estimating prevalence of exposure to occupational risk factors: An analysis from the WHO/ILO Joint Estimates of the Work-related Burden of Disease and Injury.Environ Int. 2022 Jan;158:107005. doi: 10.1016/j.envint.2021.107005. Epub 2021 Nov 30. Environ Int. 2022. PMID: 34991265 Free PMC article.
Cited by
-
Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed.Syst Rev. 2024 Jul 8;13(1):174. doi: 10.1186/s13643-024-02592-3. Syst Rev. 2024. PMID: 38978132 Free PMC article.
-
An exploration of available methods and tools to improve the efficiency of systematic review production: a scoping review.BMC Med Res Methodol. 2024 Sep 18;24(1):210. doi: 10.1186/s12874-024-02320-4. BMC Med Res Methodol. 2024. PMID: 39294580 Free PMC article.
-
Digital Tools to Support the Systematic Review Process: An Introduction.J Eval Clin Pract. 2025 Apr;31(3):e70100. doi: 10.1111/jep.70100. J Eval Clin Pract. 2025. PMID: 40290054 Free PMC article. Review.
-
Using a large language model (ChatGPT) to assess risk of bias in randomized controlled trials of medical interventions: protocol for a pilot study of interrater agreement with human reviewers.BMC Med Res Methodol. 2025 Jul 31;25(1):182. doi: 10.1186/s12874-025-02631-0. BMC Med Res Methodol. 2025. PMID: 40745627 Free PMC article.
-
Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.BMC Med Res Methodol. 2024 Nov 4;24(1):266. doi: 10.1186/s12874-024-02372-6. BMC Med Res Methodol. 2024. PMID: 39497032 Free PMC article.
References
REFERENCES
-
- Borah RB, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7(2):e012545.
-
- Bastian HG, Chalmers I. Seventy-five trials and eleven systematic reviews a day: How will we ever keep up? PLoS Med. 2010;7(9):e1000326.
-
- Paynter RB, Lionel L, Berliner E, et al. EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews. Rockville, MD: Agency for Healthcare Research and Quality (US); 2016.
-
- O'Mara-Eves A, James T, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews. 2015;4:5.
-
- Tsafnat G, Paul G, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Systematic Reviews. 2014;3(1):74.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources