Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features
- PMID: 34071029
- PMCID: PMC8197148
- DOI: 10.3390/s21113628
Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features
Abstract
Breast cancer, like most forms of cancer, is a fatal disease that claims more than half a million lives every year. In 2020, breast cancer overtook lung cancer as the most commonly diagnosed form of cancer. Though extremely deadly, the survival rate and longevity increase substantially with early detection and diagnosis. The treatment protocol also varies with the stage of breast cancer. Diagnosis is typically done using histopathological slides from which it is possible to determine whether the tissue is in the Ductal Carcinoma In Situ (DCIS) stage, in which the cancerous cells have not spread into the encompassing breast tissue, or in the Invasive Ductal Carcinoma (IDC) stage, wherein the cells have penetrated into the neighboring tissues. IDC detection is extremely time-consuming and challenging for physicians. Hence, this can be modeled as an image classification task where pattern recognition and machine learning can be used to aid doctors and medical practitioners in making such crucial decisions. In the present paper, we use an IDC Breast Cancer dataset that contains 277,524 images (with 78,786 IDC positive images and 198,738 IDC negative images) to classify the images into IDC(+) and IDC(-). To that end, we use feature extractors, including textural features, such as SIFT, SURF and ORB, and statistical features, such as Haralick texture features. These features are then combined to yield a dataset of 782 features. These features are ensembled by stacking using various Machine Learning classifiers, such as Random Forest, Extra Trees, XGBoost, AdaBoost, CatBoost and Multi Layer Perceptron followed by feature selection using Pearson Correlation Coefficient to yield a dataset with four features that are then used for classification. From our experimental results, we found that CatBoost yielded the highest accuracy (92.55%), which is at par with other state-of-the-art results-most of which employ Deep Learning architectures. The source code is available in the GitHub repository.
Keywords: IDC; breast cancer; ensemble learning; feature selection; machine learning.
Conflict of interest statement
The authors declare no conflict of interest.
Figures







Similar articles
-
Assessment of Machine Learning of Breast Pathology Structures for Automated Differentiation of Breast Cancer and High-Risk Proliferative Lesions.JAMA Netw Open. 2019 Aug 2;2(8):e198777. doi: 10.1001/jamanetworkopen.2019.8777. JAMA Netw Open. 2019. PMID: 31397859 Free PMC article.
-
Identification and transfer of spatial transcriptomics signatures for cancer diagnosis.Breast Cancer Res. 2020 Jan 13;22(1):6. doi: 10.1186/s13058-019-1242-9. Breast Cancer Res. 2020. PMID: 31931856 Free PMC article.
-
CNN-based deep learning approach for classification of invasive ductal and metastasis types of breast carcinoma.Cancer Med. 2024 Aug;13(16):e70069. doi: 10.1002/cam4.70069. Cancer Med. 2024. PMID: 39215495 Free PMC article.
-
Reviewing ensemble classification methods in breast cancer.Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20. Comput Methods Programs Biomed. 2019. PMID: 31319964 Review.
-
Breast Cancer Detection and Classification using Traditional Computer Vision Techniques: A Comprehensive Review.Curr Med Imaging. 2020;16(10):1187-1200. doi: 10.2174/1573405616666200406110547. Curr Med Imaging. 2020. PMID: 32250226 Review.
Cited by
-
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. NPJ Digit Med. 2024. PMID: 38704465 Free PMC article. Review.
-
Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer.J Thorac Dis. 2021 Nov;13(11):6240-6251. doi: 10.21037/jtd-21-1107. J Thorac Dis. 2021. PMID: 34992804 Free PMC article.
-
A robust and statistical analyzed predictive model for drug toxicity using machine learning.Sci Rep. 2025 May 23;15(1):17993. doi: 10.1038/s41598-025-02333-z. Sci Rep. 2025. PMID: 40410277 Free PMC article.
-
Enhancing histopathological image classification of invasive ductal carcinoma using hybrid harmonization techniques.Sci Rep. 2023 Nov 16;13(1):20014. doi: 10.1038/s41598-023-46239-0. Sci Rep. 2023. PMID: 37973797 Free PMC article.
-
SenseHunger: Machine Learning Approach to Hunger Detection Using Wearable Sensors.Sensors (Basel). 2022 Oct 11;22(20):7711. doi: 10.3390/s22207711. Sensors (Basel). 2022. PMID: 36298061 Free PMC article.
References
-
- Feig S.A., Yaffe M.J. Digital mammography, computer-aided diagnosis, and telemammography. Radiol. Clin. N. Am. 1995;33:1205. - PubMed
-
- Lowe D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004;60:91–110. doi: 10.1023/B:VISI.0000029664.99615.94. - DOI
-
- Bay H., Tuytelaars T., Van Gool L. SURF: Speeded up robust features; Proceedings of the 9th European Conference on Computer Vision; Graz, Austria. 7–13 May 2006; pp. 404–417. - DOI
-
- Rublee E., Rabaud V., Konolige K., Bradski G. ORB: An efficient alternative to SIFT or SURF; Proceedings of the 2011 International Conference on Computer Vision; Barcelona, Spain. 6–13 November 2011; pp. 2564–2571. - DOI
MeSH terms
LinkOut - more resources
Full Text Sources
Medical