Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 1;6(2):e230524.
doi: 10.1001/jamanetworkopen.2023.0524.

A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis

Affiliations

A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis

Nicholas Konz et al. JAMA Netw Open. .

Abstract

Importance: An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide.

Objectives: To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods.

Design, setting, and participants: This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22 032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021.

Main outcomes and measures: The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes.

Results: A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926.

Conclusions and relevance: In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Mr Shoshan, Ms Gilboa-Solomon, Mr Khapun, Dr Ratner, Ms Barkan, and Dr Ozery-Flato reported working for IBM Research during the conduct of the study. Dr Gu reported receiving grants from Duke University during the conduct of the study. Dr Martí reported receiving grants from University of Girona and the Spanish Science and Innovation Ministry during the conduct of the study. Dr Hsu reported serving as deputy editor of the journal Radiology: Artificial Intelligence and receiving research funding from the National Institutes of Health and the National Science Foundation. Drs Hossain and Lee reported receiving funding from the National Institutes of Health during the conduct of the study. Dr Kalpathy-Cramer reported receiving grants from the National Institutes of Health during the conduct of the study and receiving grants from GE Healthcare, Genentech, and Bayer and serving as a consultant for Siloam Vision outside the submitted work. Dr Petrick reported being a member of SPIE Computer-Aided Diagnosis Technical Committee, SPIE Membership Committee, American Association of Physicists in Medicine Grand Challenges Working Group, American Association of Physicists in Medicine Computer Aided Image Analysis Subcommittee, and the American Institute for Medical and Biological Engineering and being an employee of the US federal government and that this work was performed as part of his official duties. Dr Drukker reported receiving royalties from Hologic not related to this work. Dr Armato reported receiving royalties and licensing fees through the University of Chicago. Dr Mazurowski reported grants and a data set license from the National Institutes of Health during the conduct of the study. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. The Least and Most Difficult Lesions to Detect
A and B, Examples of digital breast tomosynthesis volumes containing annotated lesions that were the easiest to detect. On average, all 10 algorithms detected lesions in A and with 0.13 and 0.16 false positives, respectively. C and D, Examples of digital breast tomosynthesis volumes containing annotated lesions that were the most difficult to detect. The lesion in panel C was not detected by any algorithm, and the lesion in panel D was detected by only 2 of 10 algorithms with 1.34 false positives on average (due to the presence of a breast implant). Detection bounding boxes indicate submitted algorithm predictions. The number in the upper-left corner of each box indicates the percentile of the corresponding algorithm’s score with respect to the distribution of all algorithm scores for the volume. At most, 2 boxes per algorithm are shown, and the colors of each algorithm’s boxes correspond to the free-response receiver operating characteristic curves shown in Figure 2.
Figure 2.
Figure 2.. Free-Response Receiver Operating Characteristic Detection Curves for All Methods
Includes all participants, baseline models, and merged predictions from all algorithms and only the top 3 models from phase 2.

Comment in

References

    1. Sung H, Ferlay J, Siegel RL, et al. . Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209-249. doi:10.3322/caac.21660 - DOI - PubMed
    1. Gao Y, Moy L, Heller SL. Digital breast tomosynthesis: update on technology, evidence, and clinical practice. Radiographics. 2021;41(2):321-337. doi:10.1148/rg.2021200101 - DOI - PMC - PubMed
    1. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ. Advances in Neural Information Processing Systems 25 (NIPS 2012). Curran Associates Inc; 2012:1097-1105. Accessed January 19, 2023. https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-...
    1. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R. Advances in Neural Information Processing Systems 28 (NIPS2015). Curran Associates Inc; 2015:91-99. Accessed January 19, 2023. https://papers.nips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-...
    1. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, eds. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015. Springer International Publishing; 2015:234-241, doi:10.1007/978-3-319-24574-4_28 - DOI

Publication types