Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun;33(3):632-654.
doi: 10.1007/s10278-019-00307-y.

Conventional Machine Learning and Deep Learning Approach for Multi-Classification of Breast Cancer Histopathology Images-a Comparative Insight

Affiliations

Conventional Machine Learning and Deep Learning Approach for Multi-Classification of Breast Cancer Histopathology Images-a Comparative Insight

Shallu Sharma et al. J Digit Imaging. 2020 Jun.

Abstract

Automatic multi-classification of breast cancer histopathological images has remained one of the top-priority research areas in the field of biomedical informatics, due to the great clinical significance of multi-classification in providing diagnosis and prognosis of breast cancer. In this work, two machine learning approaches are thoroughly explored and compared for the task of automatic magnification-dependent multi-classification on a balanced BreakHis dataset for the detection of breast cancer. The first approach is based on handcrafted features which are extracted using Hu moment, color histogram, and Haralick textures. The extracted features are then utilized to train the conventional classifiers, while the second approach is based on transfer learning where the pre-existing networks (VGG16, VGG19, and ResNet50) are utilized as feature extractor and as a baseline model. The results reveal that the use of pre-trained networks as feature extractor exhibited superior performance in contrast to baseline approach and handcrafted approach for all the magnifications. Moreover, it has been observed that the augmentation plays a pivotal role in further enhancing the classification accuracy. In this context, the VGG16 network with linear SVM provides the highest accuracy that is computed in two forms, (a) patch-based accuracies (93.97% for 40×, 92.92% for 100×, 91.23% for 200×, and 91.79% for 400×); (b) patient-based accuracies (93.25% for 40×, 91.87% for 100×, 91.5% for 200×, and 92.31% for 400×) for the classification of magnification-dependent histopathological images. Additionally, "Fibro-adenoma" (benign) and "Mucous Carcinoma" (malignant) classes have been found to be the most complex classes for the entire magnification factors.

Keywords: Breast cancer; Handcrafted features; Histopathological images; Multi-classification; Transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Strategies to implement transfer learning approach. a Baseline model. b Fine-tuning. c Feature extractor
Fig. 2
Fig. 2
Histopathological image samples from BreakHis dataset for eight categories of breast cancer for ×200 magnification factors a adenosis, b fibroadenoma, c phyllods tumor, d tubular adenoma, e ductal carcinoma, f lobular carcinoma, g mucinous carcinoma, and h papillary carcinoma
Fig. 3
Fig. 3
The pre-trained networks: VGG16, VGG19 and ResNet50 as feature extractor with conventional classifiers
Fig. 4
Fig. 4
Box-plots of classification accuracy at a ×40, b ×100, c ×200, and d ×400 magnification factor. Outliers are represented by circles
Fig. 5
Fig. 5
ROC curve analysis obtained for ResNet50 network when used as baseline model at a ×40, b ×100, c ×200, and d ×400 magnification factor
Fig. 6
Fig. 6
ROC curve analysis at 40X magnification factor for a VGG16 + LR(L2), b VGG 16 + SVM(L, 1), c VGG19 + SVM(L, 1), d VGG19 + SVM(L, 5)
Fig. 7
Fig. 7
ROC curve analysis at ×100 magnification factor for a VGG16 + RF (4000), b VGG16 + SVM (L, 1), c VGG19 + RF (4000), and d VGG19 + SVM (L, 1)
Fig. 8
Fig. 8
ROC curve analysis at ×200 magnification factor for a VGG16 + LR(L2), b VGG16 + LDA, c VGG19 + RF(400), and d VGG19 + KNN
Fig. 9
Fig. 9
ROC curve analysis at ×400 magnification factor for a VGG16 + SVM (L, 1 and 5), b VGG16 + LR(L2), c VGG19 + SVM(L, 1), and d VGG19 + LR(L2)
Fig. 10
Fig. 10
ROC curve analysis of VGG16 + SVM (L, 1) classifier applied to augmented data at a ×40, b ×100, c ×200, and d ×400
Fig. 11
Fig. 11
Confusion matrixes of VGG16 + SVM (L, 1) classifier for augmented data at a ×40, b ×100, c ×200 and d ×400
Fig. 12
Fig. 12
Confusion matrixes for VGG19 + SVM (L, 1) classifier at ×400 magnification: a balanced data without augmentation, b augmented data

Similar articles

Cited by

References

    1. Breast Cancer. Available at http://www.who.int/cancer/prevention/diagnosis-screening/breast-cancer/en/.
    1. Breast Cancer Facts & Figures 2017-2018. Available at https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-....
    1. Aubreville M, et al. Automatic classification of cancerous tissue in laserendomicroscopy images of the oral cavity using deep learning. Scientific reports. 2017;7:11979. doi: 10.1038/s41598-017-12320-8. - DOI - PMC - PubMed
    1. Wilson ML, Fleming KA, Kuti MA, Looi LM, Lago N, Ru K: Access to pathology and laboratory medicine services: A crucial gap. The Lancet, 2018 - PubMed
    1. Robboy SJ, Weintraub S, Horvath AE, Jensen BW, Alexander CB, Fody EP, Crawford JM, Clark JR, Cantor-Weinberg J, Joshi MG, Cohen MB, Prystowsky MB, Bean SM, Gupta S, Powell SZ, Speights VO Jr, Gross DJ, Black-Schaffer WS. Pathologist workforce in the United States: I. Development of a predictive model to examine factors influencing supply. Archives of Pathology and Laboratory Medicine. 2013;137:1723–1732. doi: 10.5858/arpa.2013-0200-OA. - DOI - PubMed