Comparative Study

. 2021 Mar 1;4(3):e211276.

doi: 10.1001/jamanetworkopen.2021.1276.

Association of Clinician Diagnostic Performance With Machine Learning-Based Decision Support Systems: A Systematic Review

Baptiste Vasey¹, Stephan Ursprung², Benjamin Beddoe³, Elliott H Taylor¹, Neale Marlow^{1

4}, Nicole Bilbro⁵, Peter Watkinson⁶, Peter McCulloch¹

Affiliations

¹ Nuffield Department of Surgical Sciences, University of Oxford, Oxford, United Kingdom.
² Department of Radiology, University of Cambridge, Cambridge, United Kingdom.
³ Faculty of Medicine, Imperial College London, London, United Kingdom.
⁴ Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom.
⁵ Department of Surgery, Maimonides Medical Center, Brooklyn, New York.
⁶ Critical Care Research Group, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom.

PMID: 33704476
PMCID: PMC7953308
DOI: 10.1001/jamanetworkopen.2021.1276

Comparative Study

Association of Clinician Diagnostic Performance With Machine Learning-Based Decision Support Systems: A Systematic Review

Baptiste Vasey et al. JAMA Netw Open. 2021.

. 2021 Mar 1;4(3):e211276.

doi: 10.1001/jamanetworkopen.2021.1276.

Authors

Baptiste Vasey¹, Stephan Ursprung², Benjamin Beddoe³, Elliott H Taylor¹, Neale Marlow^{1

4}, Nicole Bilbro⁵, Peter Watkinson⁶, Peter McCulloch¹

Affiliations

¹ Nuffield Department of Surgical Sciences, University of Oxford, Oxford, United Kingdom.
² Department of Radiology, University of Cambridge, Cambridge, United Kingdom.
³ Faculty of Medicine, Imperial College London, London, United Kingdom.
⁴ Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom.
⁵ Department of Surgery, Maimonides Medical Center, Brooklyn, New York.
⁶ Critical Care Research Group, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom.

PMID: 33704476
PMCID: PMC7953308
DOI: 10.1001/jamanetworkopen.2021.1276

Abstract

Importance: An increasing number of machine learning (ML)-based clinical decision support systems (CDSSs) are described in the medical literature, but this research focuses almost entirely on comparing CDSS directly with clinicians (human vs computer). Little is known about the outcomes of these systems when used as adjuncts to human decision-making (human vs human with computer).

Objectives: To conduct a systematic review to investigate the association between the interactive use of ML-based diagnostic CDSSs and clinician performance and to examine the extent of the CDSSs' human factors evaluation.

Evidence review: A search of MEDLINE, Embase, PsycINFO, and grey literature was conducted for the period between January 1, 2010, and May 31, 2019. Peer-reviewed studies published in English comparing human clinician performance with and without interactive use of an ML-based diagnostic CDSSs were included. All metrics used to assess human performance were considered as outcomes. The risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) and Risk of Bias in Non-Randomised Studies-Intervention (ROBINS-I). Narrative summaries were produced for the main outcomes. Given the heterogeneity of medical conditions, outcomes of interest, and evaluation metrics, no meta-analysis was performed.

Findings: A total of 8112 studies were initially retrieved and 5154 abstracts were screened; of these, 37 studies met the inclusion criteria. The median number of participating clinicians was 4 (interquartile range, 3-8). Of the 107 results that reported statistical significance, 54 (50%) were increased by the use of CDSSs, 4 (4%) were decreased, and 49 (46%) showed no change or an unclear change. In the subgroup of studies carried out in representative clinical settings, no association between the use of ML-based diagnostic CDSSs and improved clinician performance could be observed. Interobserver agreement was the commonly reported outcome whose change was the most strongly associated with CDSS use. Four studies (11%) reported on user feedback, and, in all but 1 case, clinicians decided to override at least some of the algorithms' recommendations. Twenty-eight studies (76%) were rated as having a high risk of bias in at least 1 of the 4 QUADAS-2 core domains, and 6 studies (16%) were considered to be at serious or critical risk of bias using ROBINS-I.

Conclusions and relevance: This systematic review found only sparse evidence that the use of ML-based CDSSs is associated with improved clinician diagnostic performance. Most studies had a low number of participants, were at high or unclear risk of bias, and showed little or no consideration for human factors. Caution should be exercised when estimating the current potential of ML to improve human diagnostic performance, and more comprehensive evaluation should be conducted before deploying ML-based CDSSs in clinical settings. The results highlight the importance of considering supported human decisions as end points rather than merely the stand-alone CDSSs outputs.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Mr Vasey reported participation in a CS Digital Health Equity Fund (participations sold in January 2020) outside the submitted work. Mr Ursprung reported a scholarship from Cambridge Commonwealth, European & International Trust Scholarship during the conduct of the study. Dr Watkinson reported receiving grants from the National Institute for Health Research (NIHR) during the conduct of the study; grants from the NIHR, Wellcome, and Sensyne Health; and personal fees from Sensyne Health. He was chief medical officer for Sensyne Health and holds shares in the company outside the submitted work. No other disclosures were reported.

Figures

**Figure 1.. Flowchart of Study Inclusion**
^aOther sources included forward/backward literature search, reference search from relevant literature, trade name search, and conference abstracts or entries in the Cochrane Central Register of Controlled Trials that led to publications.

**Figure 2.. Distribution of the Risk of Bias Scores**
The total of 100% represents 37 included studies in the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) (A) and Risk of Bias in Non-Randomised Studies–Intervention (ROBINS-I) (B) domains.

See this image and copyright information in PMC

Comment in

Discovery, Learning, and Experimentation With Artificial Intelligence-Based Tools at the Point of Care-Perils and Opportunity.
Auerbach A, Fihn SD. Auerbach A, et al. JAMA Netw Open. 2021 Mar 1;4(3):e211474. doi: 10.1001/jamanetworkopen.2021.1474. JAMA Netw Open. 2021. PMID: 33704470 No abstract available.

References

1. CBInsights. State of healthcare report Q2'20: sector and investment trends to watch. Accessed January 24, 2021. https://www.cbinsights.com/research/report/healthcare-trends-q2-2020/
1. American College of Radiology Data Science Institute . FDA cleared AI algorithms. Accessed September 10, 2020. https://www.acrdsi.org/DSI-Services/FDA-Cleared-AI-Algorithms
1. Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271-e297. doi: 10.1016/S2589-7500(19)30123-2 - DOI - PubMed
1. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368:m689. "https://www.bmj.com/content/368/bmj.m689" doi: 10.1136/bmj.m689 - DOI - PMC - PubMed
1. Haselton MG, Nettle D, Murray DR. The evolution of cognitive bias. In: Buss DM, ed. The Handbook of Evolutionary Psychology. John Wiley & Sons Inc; 2015:968-987. doi: 10.1002/9781119125563.evpsych241 - DOI

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- ClinicalTrials.gov
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Association of Clinician Diagnostic Performance With Machine Learning-Based Decision Support Systems: A Systematic Review

Affiliations

Association of Clinician Diagnostic Performance With Machine Learning-Based Decision Support Systems: A Systematic Review

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous