Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 1;24(23):5902-5909.
doi: 10.1158/1078-0432.CCR-18-1115. Epub 2018 Oct 11.

Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening

Affiliations

Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening

Sarah S Aboutalib et al. Clin Cancer Res. .

Abstract

Purpose: False positives in digital mammography screening lead to high recall rates, resulting in unnecessary medical procedures to patients and health care costs. This study aimed to investigate the revolutionary deep learning methods to distinguish recalled but benign mammography images from negative exams and those with malignancy.

Experimental design: Deep learning convolutional neural network (CNN) models were constructed to classify mammography images into malignant (breast cancer), negative (breast cancer free), and recalled-benign categories. A total of 14,860 images of 3,715 patients from two independent mammography datasets: Full-Field Digital Mammography Dataset (FFDM) and a digitized film dataset, Digital Dataset of Screening Mammography (DDSM), were used in various settings for training and testing the CNN models. The ROC curve was generated and the AUC was calculated as a metric of the classification accuracy.

Results: Training and testing using only the FFDM dataset resulted in AUC ranging from 0.70 to 0.81. When the DDSM dataset was used, AUC ranged from 0.77 to 0.96. When datasets were combined for training and testing, AUC ranged from 0.76 to 0.91. When pretrained on a large nonmedical dataset and DDSM, the models showed consistent improvements in AUC ranging from 0.02 to 0.05 (all P > 0.05), compared with pretraining only on the nonmedical dataset.

Conclusions: This study demonstrates that automatic deep learning CNN methods can identify nuanced mammographic imaging features to distinguish recalled-benign images from malignant and negative cases, which may lead to a computerized clinical toolkit to help reduce false recalls.

PubMed Disclaimer

Conflict of interest statement

We have no conflicts of interests to disclose.

Figures

Figure 1.
Figure 1.
Performance results for deep learning convolutional neural network (CNN) models for classification on the full-field digital mammography (FFDM) dataset. (Left) Receiver operating characteristic (ROC) curves for the binary classification scenarios and corresponding area under the curves (AUC). (Right) ROC curves for the triple-class classification scenario and averaged AUC.
Figure 2.
Figure 2.
Performance results for deep learning convolutional neural network (CNN) models for classification on the digital database of screening mammography (DDSM) dataset. (Left) Receiver operating characteristic (ROC) curves for the binary classification scenarios and corresponding area under the curves (AUC). (Right) ROC curves for the triple-class classification scenario and averaged AUC.
Figure 3.
Figure 3.
Performance results for deep learning convolutional neural network (CNN) models for classification using combined Full-field Digital Mammography (FFDM) and digital database of screening mammography (DDSM) datasets for training and testing. (Left) Receiver operating characteristic (ROC) curves for the binary classification scenarios and corresponding area under the curves (AUC). (Right) ROC curves for the triple-class classification scenario and averaged AUC.
Figure 4.
Figure 4.
Comparison of performance results of deep learning convolutional neural network (CNN) models on different pre-training strategies: using original ImageNet pre-trained model vs. using model pre-trained on ImageNet and digital database of screening mammography (DDSM) dataset. All the area under the curves (AUCs) were results based on training (fine-tuning) and testing on the full-field digital mammography (FFDM) dataset.
Figure 5.
Figure 5.
Comparison of performance results using varying amounts (5%, 10%, and 15%) of testing data across all models: (Top-Left) FFDM trained model; (Top-Right) DDSM trained model; (Bottom-Left) FFDM + DDSM trained model; (Bottom-Right) Incrementally pre-trained CNN models in all scenarios.

References

    1. Tabar L, Fagerberg G, Chen HH, et al. Efficacy of breast cancer screening by age: New results from the Swedish Two-County Trial. Cancer. 1995;75(10):2507–17. - PubMed
    1. Preventive Services Task Force US. Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 2016;164(4):279–96. - PubMed
    1. Coldman A, Phillips N, Wilson C, et al. Pan-Canadian study of mammography screening and mortality from breast cancer. 2014;106(11). - PubMed
    1. Lehman D, Arao RF, Sprague BL, et al. National Performance Benchmarks for Modern Screening Digital Mammography: Update from the Breast Cancer Surveillance Consortium Constance. Radiology. 2017;283(1):49–58. - PMC - PubMed
    1. Silverstein MJ, Lagios MD, Recht A, et al. Image-detected breast cancer: state of the art diagnosis and treatment. J Am Coll Surg. 2005;201(4):586–97. - PubMed

Publication types