Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 24;12(1):5645.
doi: 10.1038/s41467-021-26023-2.

Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

Affiliations

Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

Yiqiu Shen et al. Nat Commun. .

Abstract

Though consistently shown to detect mammographically occult cancers, breast ultrasound has been noted to have high false-positive rates. In this work, we present an AI system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. Developed on 288,767 exams, consisting of 5,442,907 B-mode and Color Doppler images, the AI achieves an area under the receiver operating characteristic curve (AUROC) of 0.976 on a test set consisting of 44,755 exams. In a retrospective reader study, the AI achieves a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924 ± 0.02 radiologists). With the help of the AI, radiologists decrease their false positive rates by 37.3% and reduce requested biopsies by 27.8%, while maintaining the same level of sensitivity. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the system’s pipeline.
a US images were pre-processed to extract the breast laterality (i.e., left or right breast) and to include only the part of the image which shows the breast (cropping out the image periphery which typically contains textual metadeta about the patient and US acquisition technique). b For each breast, we assigned a cancer label using the recorded pathology reports for the respective patient within −30 and 120 days from the time of the US examination. We applied additional filtering on the internal test set to ensure that cancers in positive exams are visible in the US images and negative exams have at least one cancer-negative follow-up (see Methods section `Additional filtering of the test'). c The AI system processes all US images acquired from one breast to compute probabilistic predictions for the presence of malignant lesions. The AI system also generates saliency maps that indicate the informative regions in each image. d We evaluated the system on an internal test set (AUROC: 0.976, 95% CI: 0.972, 0.980, n = 79,156 breasts) and an external test set (AUROC: 0.927, 95% CI: 0.907, 0.959, n = 780 images). e In a reader study consisting of 663 exams (n = 1024 breasts), we showed that the AI system can improve the specificity and positive predictive value (PPV) for 10 attending radiologists while maintaining the same level of sensitivity and negative predictive value (NPV).
Fig. 2
Fig. 2. Reader study results.
The performance of the AI system on the reader study population (n = 1024 breasts) using ROC curve (a) and precision-recall curve (b). The AI achieved 0.962 (95% CI: 0.943, 0.979) AUROC and 0.752 (95% CI: 0.675, 0.849) AUPRC. Each data point represents a single reader and the triangles correspond to the average reader performance. The inset shows a magnification of the gray shaded region.
Fig. 3
Fig. 3. Qualitative analysis of saliency maps.
In each of the six cases (af) from the reader study, we visualized the sagittal and transverse views of the lesion (left) and the AI’s saliency maps indicating the predicted locations of benign (middle) and malignant (right) findings (see Methods section `Deep neural network architecture'). Exams ac display lesions that were ultimately biopsied and found to be malignant. All readers and the AI system correctly classified exams ab as suspicious for malignancy. However, the majority of readers (7/10) and the AI system incorrectly classified case c as benign. Cases df display lesions that were biopsied and found to be benign. The majority of readers incorrectly classified exams d (9/10), e (10/10), and f (10/10) as suspicious for malignancy and recommended the lesions undergo biopsy. In contrast, the AI system classified exam d as malignant, but correctly identified exams ef as being benign.
Fig. 4
Fig. 4. Performance of readers, AI, and hybrid models.
We reported the observed values (measure of center) and 95% confidence intervals (error bars) of AUROC (a), AUPRC (b), specificity (c), biopsy rate (d), and PPV (e) of ten radiologists (R1-R10), AI, and the hybrid models on the reader study set (n = 1024 breasts) The predictions of each hybrid model are weighted averages of each reader’s BI-RADS scores and the AI’s probablistic predictions (see Methods section `Hybrid model'). We dichotomized each hybrid model’s probabilistic predictions to match the sensitivity of its respective reader. We dichotomized the AI’s predictions to match the average radiologists' sensitivity. The collaboration between AI and readers improves readers' AUROC, AUPRC, specificity, and PPV, while reducing biopsy rate. We estimated the 95% confidence intervals by 1000 iterations of the bootstrap method.

References

    1. Sung H, et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J. Clin. 2021;71:7–33. doi: 10.3322/caac.21654. - DOI - PubMed
    1. Arleo EK, Hendrick RE, Helvie MA, Sickles EA. Comparison of recommendations for screening mammography using cisnet models. Cancer. 2017;123:3673–3680. doi: 10.1002/cncr.30842. - DOI - PubMed
    1. Feig S. Cost-effectiveness of mammography, MRI, and ultrasonography for breast cancer screening. Radiol. Clin. 2010;48:879–891. doi: 10.1016/j.rcl.2010.06.002. - DOI - PubMed
    1. Kolb TM, Lichy J, Newhouse JH. Comparison of the performance of screening mammography, physical examination, and breast us and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology. 2002;225:165–175. doi: 10.1148/radiol.2251011667. - DOI - PubMed

Publication types