Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians

Krishnamurthy Dj Dvijotham^#¹, Jim Winkens^#², Melih Barsbey^#³, Sumedh Ghaisas^#⁴, Robert Stanforth^#⁴, Nick Pawlowski⁵, Patricia Strachan⁶, Zahra Ahmed⁴, Shekoofeh Azizi⁷, Yoram Bachrach⁴, Laura Culp⁷, Mayank Daswani⁶, Jan Freyberg⁶, Christopher Kelly⁶, Atilla Kiraly⁸, Timo Kohlberger⁸, Scott McKinney⁹, Basil Mustafa¹⁰, Vivek Natarajan⁸, Krzysztof Geras¹¹, Jan Witowski¹¹, Zhi Zhen Qin¹², Jacob Creswell¹², Shravya Shetty⁸, Marcin Sieniek⁸, Terry Spitz⁶, Greg Corrado⁸, Pushmeet Kohli⁴, Taylan Cemgil^#⁴, Alan Karthikesalingam^#⁶

Affiliations

¹ Google DeepMind, Mountain View, CA, USA. dvij@cs.washington.edu.
² Google Research, New York, NY, USA. jimwinkens@google.com.
³ Bogazici University, Istanbul, Turkey.
⁴ Google DeepMind, London, UK.
⁵ Microsoft Research, Cambridge, UK.
⁶ Google Research, London, UK.
⁷ Google DeepMind, Toronto, Ontario, Canada.
⁸ Google Research, Palo Alto, CA, USA.
⁹ OpenAI, San Francisco, CA, USA.
¹⁰ Google DeepMind, Zurich, Switzerland.
¹¹ NYU Grossman School of Medicine, New York, NY, USA.
¹² Stop TB Partnership, Geneva, Switzerland.

^# Contributed equally.

PMID: 37460754
DOI: 10.1038/s41591-023-02437-x

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians

Krishnamurthy Dj Dvijotham et al. Nat Med. 2023 Jul.

. 2023 Jul;29(7):1814-1820.

doi: 10.1038/s41591-023-02437-x. Epub 2023 Jul 17.

Authors

Affiliations

¹ Google DeepMind, Mountain View, CA, USA. dvij@cs.washington.edu.
² Google Research, New York, NY, USA. jimwinkens@google.com.
³ Bogazici University, Istanbul, Turkey.
⁴ Google DeepMind, London, UK.
⁵ Microsoft Research, Cambridge, UK.
⁶ Google Research, London, UK.
⁷ Google DeepMind, Toronto, Ontario, Canada.
⁸ Google Research, Palo Alto, CA, USA.
⁹ OpenAI, San Francisco, CA, USA.
¹⁰ Google DeepMind, Zurich, Switzerland.
¹¹ NYU Grossman School of Medicine, New York, NY, USA.
¹² Stop TB Partnership, Geneva, Switzerland.

^# Contributed equally.

PMID: 37460754
DOI: 10.1038/s41591-023-02437-x

Abstract

Predictive artificial intelligence (AI) systems based on deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings, but can make errors in cases accurately diagnosed by clinicians and vice versa. We developed Complementarity-Driven Deferral to Clinical Workflow (CoDoC), a system that can learn to decide between the opinion of a predictive AI model and a clinical workflow. CoDoC enhances accuracy relative to clinician-only or AI-only baselines in clinical workflows that screen for breast cancer or tuberculosis (TB). For breast cancer screening, compared to double reading with arbitration in a screening program in the UK, CoDoC reduced false positives by 25% at the same false-negative rate, while achieving a 66% reduction in clinician workload. For TB triaging, compared to standalone AI and clinical workflows, CoDoC achieved a 5-15% reduction in false positives at the same false-negative rate for three of five commercially available predictive AI systems. To facilitate the deployment of CoDoC in novel futuristic clinical settings, we present results showing that CoDoC's performance gains are sustained across several axes of variation (imaging modality, clinical setting and predictive AI system) and discuss the limitations of our evaluation and where further validation would be needed. We provide an open-source implementation to encourage further research and application.

PubMed Disclaimer

Comment in

Balancing human and AI roles in clinical imaging.
Gilbert F. Gilbert F. Nat Med. 2023 Jul;29(7):1609-1610. doi: 10.1038/s41591-023-02441-1. Nat Med. 2023. PMID: 37460755 No abstract available.

References

1. Ruamviboonsuk, P. et al. Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program. NPJ Digit. Med. 2, 25 (2019). - DOI - PMC
1. McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020). - DOI - PubMed
1. Lee, C. S. & Lee, A. Y. Clinical applications of continual learning machine learning. Lancet Digit. Health 2, e279–e281 (2020). - DOI - PubMed - PMC
1. Shen, Y. et al. An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization. Med. Image Anal. 68, 101908 (2021). - DOI - PubMed
1. Vokinger, K. N., Feuerriegel, S. & Kesselheim, A. S. Continual learning in medical devices: FDA’s action plan and beyond. Lancet Digit. Health 3, e337–e338 (2021). - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians

Affiliations

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians

Authors

Affiliations

Abstract

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources