. 2021 Sep 24;12(1):5645.

doi: 10.1038/s41467-021-26023-2.

Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

Yiqiu Shen^#¹, Farah E Shamout^#², Jamie R Oliver^#³, Jan Witowski³, Kawshik Kannan⁴, Jungkyu Park⁵, Nan Wu¹, Connor Huddleston³, Stacey Wolfson³, Alexandra Millet³, Robin Ehrenpreis³, Divya Awal³, Cathy Tyma³, Naziya Samreen³, Yiming Gao³, Chloe Chhor³, Stacey Gandhi³, Cindy Lee³, Sheila Kumari-Subaiya³, Cindy Leonard³, Reyhan Mohammed³, Christopher Moczulski³, Jaime Altabet³, James Babb³, Alana Lewin³, Beatriu Reig³, Linda Moy^{3

5}, Laura Heacock³, Krzysztof J Geras^{6

7

8}

Affiliations

¹ Center for Data Science, New York University, New York, NY, USA.
² Engineering Division, NYU Abu Dhabi, Abu Dhabi, UAE.
³ Department of Radiology, NYU Grossman School of Medicine, New York, NY, USA.
⁴ Department of Computer Science, Courant Institute, New York University, New York, NY, USA.
⁵ Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA.
⁶ Center for Data Science, New York University, New York, NY, USA. k.j.geras@nyu.edu.
⁷ Department of Radiology, NYU Grossman School of Medicine, New York, NY, USA. k.j.geras@nyu.edu.
⁸ Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA. k.j.geras@nyu.edu.

^# Contributed equally.

PMID: 34561440
PMCID: PMC8463596
DOI: 10.1038/s41467-021-26023-2

Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

Yiqiu Shen et al. Nat Commun. 2021.

. 2021 Sep 24;12(1):5645.

doi: 10.1038/s41467-021-26023-2.

Authors

Affiliations

¹ Center for Data Science, New York University, New York, NY, USA.
² Engineering Division, NYU Abu Dhabi, Abu Dhabi, UAE.
³ Department of Radiology, NYU Grossman School of Medicine, New York, NY, USA.
⁴ Department of Computer Science, Courant Institute, New York University, New York, NY, USA.
⁵ Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA.
⁶ Center for Data Science, New York University, New York, NY, USA. k.j.geras@nyu.edu.
⁷ Department of Radiology, NYU Grossman School of Medicine, New York, NY, USA. k.j.geras@nyu.edu.
⁸ Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA. k.j.geras@nyu.edu.

^# Contributed equally.

PMID: 34561440
PMCID: PMC8463596
DOI: 10.1038/s41467-021-26023-2

Abstract

Though consistently shown to detect mammographically occult cancers, breast ultrasound has been noted to have high false-positive rates. In this work, we present an AI system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. Developed on 288,767 exams, consisting of 5,442,907 B-mode and Color Doppler images, the AI achieves an area under the receiver operating characteristic curve (AUROC) of 0.976 on a test set consisting of 44,755 exams. In a retrospective reader study, the AI achieves a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924 ± 0.02 radiologists). With the help of the AI, radiologists decrease their false positive rates by 37.3% and reduce requested biopsies by 27.8%, while maintaining the same level of sensitivity. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Overview of the system’s pipeline.**
a US images were pre-processed to extract the breast laterality (i.e., left or right breast) and to include only the part of the image which shows the breast (cropping out the image periphery which typically contains textual metadeta about the patient and US acquisition technique). b For each breast, we assigned a cancer label using the recorded pathology reports for the respective patient within −30 and 120 days from the time of the US examination. We applied additional filtering on the internal test set to ensure that cancers in positive exams are visible in the US images and negative exams have at least one cancer-negative follow-up (see Methods section `Additional filtering of the test'). c The AI system processes all US images acquired from one breast to compute probabilistic predictions for the presence of malignant lesions. The AI system also generates saliency maps that indicate the informative regions in each image. d We evaluated the system on an internal test set (AUROC: 0.976, 95% CI: 0.972, 0.980, n = 79,156 breasts) and an external test set (AUROC: 0.927, 95% CI: 0.907, 0.959, n = 780 images). e In a reader study consisting of 663 exams (n = 1024 breasts), we showed that the AI system can improve the specificity and positive predictive value (PPV) for 10 attending radiologists while maintaining the same level of sensitivity and negative predictive value (NPV).

**Fig. 2. Reader study results.**
The performance of the AI system on the reader study population (n = 1024 breasts) using ROC curve (a) and precision-recall curve (b). The AI achieved 0.962 (95% CI: 0.943, 0.979) AUROC and 0.752 (95% CI: 0.675, 0.849) AUPRC. Each data point represents a single reader and the triangles correspond to the average reader performance. The inset shows a magnification of the gray shaded region.

**Fig. 3. Qualitative analysis of saliency maps.**
In each of the six cases (a–f) from the reader study, we visualized the sagittal and transverse views of the lesion (left) and the AI’s saliency maps indicating the predicted locations of benign (middle) and malignant (right) findings (see Methods section `Deep neural network architecture'). Exams a–c display lesions that were ultimately biopsied and found to be malignant. All readers and the AI system correctly classified exams a–b as suspicious for malignancy. However, the majority of readers (7/10) and the AI system incorrectly classified case c as benign. Cases d–f display lesions that were biopsied and found to be benign. The majority of readers incorrectly classified exams d (9/10), e (10/10), and f (10/10) as suspicious for malignancy and recommended the lesions undergo biopsy. In contrast, the AI system classified exam d as malignant, but correctly identified exams e–f as being benign.

**Fig. 4. Performance of readers, AI, and hybrid models.**
We reported the observed values (measure of center) and 95% confidence intervals (error bars) of AUROC (a), AUPRC (b), specificity (c), biopsy rate (d), and PPV (e) of ten radiologists (R1-R10), AI, and the hybrid models on the reader study set (n = 1024 breasts) The predictions of each hybrid model are weighted averages of each reader’s BI-RADS scores and the AI’s probablistic predictions (see Methods section `Hybrid model'). We dichotomized each hybrid model’s probabilistic predictions to match the sensitivity of its respective reader. We dichotomized the AI’s predictions to match the average radiologists' sensitivity. The collaboration between AI and readers improves readers' AUROC, AUPRC, specificity, and PPV, while reducing biopsy rate. We estimated the 95% confidence intervals by 1000 iterations of the bootstrap method.

See this image and copyright information in PMC

References

1. Sung H, et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J. Clin. 2021;71:7–33. doi: 10.3322/caac.21654. - DOI - PubMed
1. Arleo EK, Hendrick RE, Helvie MA, Sickles EA. Comparison of recommendations for screening mammography using cisnet models. Cancer. 2017;123:3673–3680. doi: 10.1002/cncr.30842. - DOI - PubMed
1. Feig S. Cost-effectiveness of mammography, MRI, and ultrasonography for breast cancer screening. Radiol. Clin. 2010;48:879–891. doi: 10.1016/j.rcl.2010.06.002. - DOI - PubMed
1. Kolb TM, Lichy J, Newhouse JH. Comparison of the performance of screening mammography, physical examination, and breast us and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology. 2002;225:165–175. doi: 10.1148/radiol.2251011667. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

Affiliations

Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical