Comparative Study

. 2024 Aug 30;15(1):7525.

doi: 10.1038/s41467-024-51725-8.

Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer

Helen M L Frazer^#^{1

2

3}, Carlos A Peña-Solorzano^#^{4

5}, Chun Fung Kwok^#^{4

5}, Michael S Elliott^#^{4

5}, Yuanhong Chen⁶, Chong Wang⁶; BRAIx Team; Jocelyn F Lippey^{7

8

9}, John L Hopper¹⁰, Peter Brotchie¹¹, Gustavo Carneiro^{6

12}, Davis J McCarthy^{4

5}

Collaborators, Affiliations

Collaborators

BRAIx Team:
Osamah Al-Qershi, Samantha K Fox, Brendan Hill, Ravishankar Karthik, Katrina Kunicki, Shuai Li, Enes Makalic, Tuong L Nguyen, Prabhathi Basnayake Ralalage, Daniel Schmidt, Prue C Weideman

Affiliations

¹ St Vincent's BreastScreen, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia. helen.frazer@svha.org.au.
² BreastScreen Victoria, Caulfield, VIC, Australia. helen.frazer@svha.org.au.
³ Faculty of Medicine, Dentistry & Health Sciences, University of Melbourne, Melbourne, VIC, Australia. helen.frazer@svha.org.au.
⁴ Bioinformatics and Cellular Genomics Unit, St Vincent's Institute of Medical Research, Fitzroy, VIC, Australia.
⁵ Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia.
⁶ School of Computer Science, Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia.
⁷ St Vincent's BreastScreen, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia.
⁸ Department of Surgery, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia.
⁹ Department of Surgery, University of Melbourne, Melbourne, VIC, Australia.
¹⁰ Centre for Epidemiology & Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.
¹¹ Department of Radiology, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia.
¹² Centre for Vision, Speech and Signal Processing (CVSSP), The University of Surrey, Surrey, UK.

^# Contributed equally.

PMID: 39214982
PMCID: PMC11364867
DOI: 10.1038/s41467-024-51725-8

Comparative Study

Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer

Helen M L Frazer et al. Nat Commun. 2024.

. 2024 Aug 30;15(1):7525.

doi: 10.1038/s41467-024-51725-8.

Authors

Collaborators

BRAIx Team:
Osamah Al-Qershi, Samantha K Fox, Brendan Hill, Ravishankar Karthik, Katrina Kunicki, Shuai Li, Enes Makalic, Tuong L Nguyen, Prabhathi Basnayake Ralalage, Daniel Schmidt, Prue C Weideman

Affiliations

¹ St Vincent's BreastScreen, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia. helen.frazer@svha.org.au.
² BreastScreen Victoria, Caulfield, VIC, Australia. helen.frazer@svha.org.au.
³ Faculty of Medicine, Dentistry & Health Sciences, University of Melbourne, Melbourne, VIC, Australia. helen.frazer@svha.org.au.
⁴ Bioinformatics and Cellular Genomics Unit, St Vincent's Institute of Medical Research, Fitzroy, VIC, Australia.
⁵ Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia.
⁶ School of Computer Science, Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia.
⁷ St Vincent's BreastScreen, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia.
⁸ Department of Surgery, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia.
⁹ Department of Surgery, University of Melbourne, Melbourne, VIC, Australia.
¹⁰ Centre for Epidemiology & Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.
¹¹ Department of Radiology, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia.
¹² Centre for Vision, Speech and Signal Processing (CVSSP), The University of Surrey, Surrey, UK.

^# Contributed equally.

PMID: 39214982
PMCID: PMC11364867
DOI: 10.1038/s41467-024-51725-8

Abstract

Artificial intelligence (AI) readers of mammograms compare favourably to individual radiologists in detecting breast cancer. However, AI readers cannot perform at the level of multi-reader systems used by screening programs in countries such as Australia, Sweden, and the UK. Therefore, implementation demands human-AI collaboration. Here, we use a large, high-quality retrospective mammography dataset from Victoria, Australia to conduct detailed simulations of five potential AI-integrated screening pathways, and examine human-AI interaction effects to explore automation bias. Operating an AI reader as a second reader or as a high confidence filter improves current screening outcomes by 1.9-2.5% in sensitivity and up to 0.6% in specificity, achieving 4.6-10.9% reduction in assessments and 48-80.7% reduction in human reads. Automation bias degrades performance in multi-reader settings but improves it for single-readers. This study provides insight into feasible approaches for AI-integrated screening pathways and prospective studies necessary prior to clinical adoption.

PubMed Disclaimer

Conflict of interest statement

P.B. is an employee of annalise.ai. C.W., Y.C., D.J.M., M.S.E., H.M.L.F. and G.C. are inventors on a patent, 'WO2024044815—Improved classification methods for machine learning', a model used in versions of the BRAIx AI reader. The remaining authors declare no competing interests.

Figures

**Fig. 1. Screening episode flows for the current reader system and AI-integration scenarios.**
A Standard of care scenario: Readers 1 and 2 see the same episode and opt to recall or not-recall, if they disagree Reader 3 arbitrates. B AI standalone scenario: all decisions are taken by the AI Reader without human intervention. C AI single-reader scenario: Reader 1 takes the final decision using AI Reader input. D AI reader-replacement: same as (A) but with AI Reader replacing Reader 2. E AI band-pass scenario: AI Reader screens out episodes before Readers 1 and 2. Episodes with high scores trigger the recall decision directly, and episodes with low scores trigger the no-recall decision directly. The other episodes continue to the usual reader system. F AI triage scenario: AI reader triages the episodes before Readers 1 and 2. Episodes with high scores continue to the usual system, and episodes with low scores go through the path with only 1 reader.

**Fig. 2. Performance of the AI reader on the retrospective cohort.**
A The AI reader ROC curve compared with the weighted mean individual reader and reader consensus. The AI reader achieved an AUC of 0.932 (95% CI 0.923, 0.940, n = 149,105 screening episodes) above the weighted mean individual reader performance (95.6% specificity, 66.7% sensitivity) but below the reader consensus performance (96.1% specificity, 79.8% sensitivity; standard of care). The weighted mean individual reader (black circle; n = 125 readers) is the mean sensitivity and specificity of all the individual readers (grey circles) weighted by their respective total number of reads. B, C AI reader compared against 81 individual readers (min. 1000 reads). An optimal point from each AI reader ROC curve is shown for each comparison. We show separately human readers for which both sensitivity and specificity of the AI reader point was greater than or equal to the reader (B; 74 readers, 91.3% of readers; 253,328 reads, 88.3% of reads) and readers for which the AI reader is less than or equal to the human reader in either sensitivity or specificity (C; 7 readers, 8.6%; 33,525 reads, 11.7%). Source data are provided as a Source Data file.

**Fig. 3. Comparison of AI-integrated scenarios.**
A Human reader consensus performance compared with AI standalone, AI reader-replacement, AI band-pass and AI triage on the retrospective cohort (n = 149,105 screening episodes) without interaction effects. Representative points are shown for AI standalone (96.0% specificity, 75.0% sensitivity), AI single reader (95.6% specificity, 67.3% sensitivity), AI reader-replacement (96.3% specificity, 82.3% sensitivity), AI band-pass (96.6%, 81.7%) and AI triage (95.7% specificity, 78.0% sensitivity). Other potential operating points are shown as a continuous line. Both AI reader-replacement and AI band-pass improved performance over the human reader consensus (96.1% specificity, 79.8% sensitivity). B AI-integrated scenarios when reader performance is varied with an interaction effect when the human reader disagrees with the AI reader. From 0% to 50% of discordant decisions are reversed when the AI reader was correct (triangle, positive effect), uniformly (circle, neutral effect) and incorrect (diamond, negative effect). For AI triage to match human reader consensus performance, a 15% positive interaction effect of the AI reader on human readers is required. Source data are provided as a Source Data file.

**Fig. 4. Screening episode exclusion criteria.**
Flow diagram of study exclusion criteria for screening episodes from the standardised screening pathway at BreastScreen Victoria. Missing data could be clinical data without mammograms or mammograms without clinical data, clinical data could also be incomplete missing assessment, reader or screening records. Earlier screening attempt refers to a client returning for imaging as part of the same screening round, only the last attempt was used. Failed outcome determination and failed outcome reduction refer to being unable to confirm the final screening outcome for the episode. Missing reader records refer to missing reader data. Inconsistent recall status refers to conflicting data sources on whether episodes was recalled. Incomplete screening years refers to years in which we did not have the full year of data to sample from (2013–2015), these years were excluded from testing and development datasets as they are not representative.

See this image and copyright information in PMC

References

1. World Cancer Research Fund. Breast cancer. https://www.wcrf.org/dietandcancer/breast-cancer/ (2021).
1. Australian Institute of Health and Welfare. BreastScreen Australia Monitoring Report 2022 (Australian Institute of Health and Welfare, 2022).
1. Morrell, S., Taylor, R., Roder, D. & Dobson, A. Mammography screening and breast cancer mortality in australia: an aggregate cohort study. J. Med. Screen.19, 26–34 (2012). 10.1258/jms.2012.011127 - DOI - PubMed
1. Rodríguez-Ruiz, A. et al. Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologistsdembrower. J. Natl. Cancer Inst.111, 916–922 (2019). 10.1093/jnci/djy222 - DOI - PMC - PubMed
1. Rodríguez-Ruiz, A. et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology290, 305–314 (2019). 10.1148/radiol.2018181371 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer

Collaborators

Affiliations

Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical