Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature

Deeksha Bhalla¹, Krithika Rangarajan¹, Tany Chandra¹, Subhashis Banerjee², Chetan Arora²

Affiliations

¹ Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India.
² Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India.

PMID: 38912238
PMCID: PMC11188703
DOI: 10.1055/s-0043-1775737

Review

Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature

Deeksha Bhalla et al. Indian J Radiol Imaging. 2023.

. 2023 Oct 10;34(3):469-487.

doi: 10.1055/s-0043-1775737. eCollection 2024 Jul.

Authors

Deeksha Bhalla¹, Krithika Rangarajan¹, Tany Chandra¹, Subhashis Banerjee², Chetan Arora²

Affiliations

¹ Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India.
² Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India.

PMID: 38912238
PMCID: PMC11188703
DOI: 10.1055/s-0043-1775737

Abstract

Background Although abundant literature is currently available on the use of deep learning for breast cancer detection in mammography, the quality of such literature is widely variable. Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility and to ascertain best practices for model design. Methods The PubMed and Scopus databases were searched to identify records that described the use of deep learning to detect lesions or classify images into cancer or noncancer. A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool was developed for this review and was applied to the included studies. Results of reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity, specificity) were recorded. Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria. Training and test datasets, key idea behind model architecture, and results were recorded for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias due to nonrepresentative patient selection. Four studies were of adequate quality, of which three trained their own model, and one used a commercial network. Ensemble models were used in two of these. Common strategies used for model training included patch classifiers, image classification networks (ResNet in 67%), and object detection networks (RetinaNet in 67%). The highest reported AUC was 0.927 ± 0.008 on a screening dataset, while it reached 0.945 (0.919-0.968) on an enriched subset. Higher values of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and Artificial Intelligence readings were used than either of them alone. None of the studies provided explainability beyond localization accuracy. None of the studies have studied interaction between AI and radiologist in a real world setting. Conclusion While deep learning holds much promise in mammography interpretation, evaluation in a reproducible clinical setting and explainable networks are the need of the hour.

Keywords: artificial intelligence; breast cancer; deep learning; mammography; neural networks; systematic review.

Indian Radiological Association. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ).

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest None declared.

Figures

**Fig. 1**
Summary of study inclusion process for our review.

**Fig. 2**
Summary of detailed analysis of studies which qualified mQUADAS-2.

See this image and copyright information in PMC

References

1. Warren Burhenne L J, Wood S A, D'Orsi C J et al.Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology. 2000;215(02):554–562. - PubMed
1. Birdwell R L, Ikeda D M, O'Shaughnessy K F, Sickles E A. Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology. 2001;219(01):192–202. - PubMed
1. Birdwell R L, Bandodkar P, Ikeda D M. Computer-aided detection with screening mammography in a university hospital setting. Radiology. 2005;236(02):451–457. - PubMed
1. Brem R F, Baum J, Lechner M et al.Improvement in sensitivity of screening mammography with computer-aided detection: a multiinstitutional trial. Am J Roentgenol. 2003;181(03):687–693. - PubMed
1. Freeman K, Geppert J, Stinton C et al.Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ. 2021;374:n1872. - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature

Affiliations

Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources