Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 16;13(1):20014.
doi: 10.1038/s41598-023-46239-0.

Enhancing histopathological image classification of invasive ductal carcinoma using hybrid harmonization techniques

Affiliations

Enhancing histopathological image classification of invasive ductal carcinoma using hybrid harmonization techniques

Nassib Abdallah et al. Sci Rep. .

Abstract

This study aims to develop a robust pipeline for classifying invasive ductal carcinomas and benign tumors in histopathological images, addressing variability within and between centers. We specifically tackle the challenge of detecting atypical data and variability between common clusters within the same database. Our feature engineering-based pipeline comprises a feature extraction step, followed by multiple harmonization techniques to rectify intra- and inter-center batch effects resulting from image acquisition variability and diverse patient clinical characteristics. These harmonization steps facilitate the construction of more robust and efficient models. We assess the proposed pipeline's performance on two public breast cancer databases, BreaKHIS and IDCDB, utilizing recall, precision, and accuracy metrics. Our pipeline outperforms recent models, achieving 90-95% accuracy in classifying benign and malignant tumors. We demonstrate the advantage of harmonization for classifying patches from different databases. Our top model scored 94.7% for IDCDB and 95.2% for BreaKHis, surpassing existing feature engineering-based models (92.1% for IDCDB and 87.7% for BreaKHIS) and attaining comparable performance to deep learning models. The proposed feature-engineering-based pipeline effectively classifies malignant and benign tumors while addressing variability within and between centers through the incorporation of various harmonization techniques. Our findings reveal that harmonizing variabilities between patches from different batches directly impacts the learning and testing performance of classification models. This pipeline has the potential to enhance breast cancer diagnosis and treatment and may be applicable to other diseases.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The architecture of our intra-base harmonization module, consisting of 6 steps. The input is a database; the first step is the extraction of features, followed by a normalization of the different groups of features. Then, a split into learning and testing is performed, followed by a processing on the learning samples to reduce the intra-base variabilities.
Figure 2
Figure 2
Our complete pipeline: the first step consists in applying the intra-database harmonization module to each database. The second step consists in applying the inter-database harmonization module to the data from different sources (here the two databases). The last step consists in training the classifier.
Figure 3
Figure 3
The projection of the samples onto the principal factorial plane, both before and after harmonization, elucidates the impact of our methodology on the projected scatterplot. As illustrated, patches with either IDC or non-IDC subtypes can exist as outliers within the entire dataset and need to be aligned closer to the reference scatterplot, which comprises the majority of samples.
Figure 4
Figure 4
Flow diagram for outliers detection: the first step consists in applying the outlier’s detection methods. Based on the results, the second step consists in classifying the samples as atypical or normal. The third step consists in training a logistic regression model to classify IDC/nonIDC patches on the atypical-free datasets. Finally, the last step consists in selecting the best model based on the MSE criterion (the classification performance of the RLog model).
Figure 5
Figure 5
On the left, we present examples of central patches, which constitute the majority within the entire histopathological slide. On the right, we showcase examples of border patches. These two distinct types of patches are invariably present in histopathological studies, as they result from the segmentation of a whole slide.
Figure 6
Figure 6
Representation of patches grouped by class: the patches on the right contain no malignant tumor whereas those on the left contain malignant tumor.
Figure 7
Figure 7
Results from the SHAP model on the IDCDB classification dataset, highlighting the most influential features for classification. Feature 72: Feat_Red_47; Feature 122 : Feat_Green_47; Feature 73 : Feat_Red_48; Feature 123 : Feat_Green_48; Feature 11 : longest_strike_above_mean; Feature 6 : autocorrelation; Feature 184 : Feat_Moments_Red_9; Feature 215 : Feat_Moments_Green_9; Feature 186 : Feat_Moments_Red_11; Feature 217: Feat_Moments_Green_11; Feature 14 : mean_change; Feature 8 : maximum; Feature 10 : kurtosis; Feature 193 : Feat_Moments_Red_18; Feature 224 : Feat_Moments_Green_18; Feature 173 : Feat_Blue_48; Feature 172 : Feat_Blue_47; Feature 17 : ratio_value_number_to_time_series_length; Feature 158 : Feat_Blue_33; Feature 98 : Feat_Green_23.

Similar articles

Cited by

References

    1. Sollini Martina, Cozzi Luca, Ninatti Gaia, Antunovic Lidija, Cavinato Lara, Chiti Arturo, Kirienko Margarita. PET/CT radiomics in breast cancer: Mind the step. Methods. 2021;188:122–132. doi: 10.1016/j.ymeth.2020.01.007. - DOI - PubMed
    1. Kitajima K, Miyoshi Y, Sekine T, Takei H, Ito K, Suto A, Kaida H, Ishii K, Daisaki H, Yamakado K. Harmonized pretreatment quantitative volume-based FDG-PET/CT parameters for prognosis of stage I-III breast cancer: Multicenter study. Oncotarget. 2021;12(2):95–105. doi: 10.18632/oncotarget.27851. - DOI - PMC - PubMed
    1. Ramtohul T, et al. Multiparametric MRI and radiomics for the prediction of HER2-zero,-low, and-positive breast cancers. Radiology. 2023;308(2):e222646. doi: 10.1148/radiol.222646. - DOI - PubMed
    1. Joann G Elmore, Gary M Longton, Patricia A Carney, Berta M Geller, Tracy Onega, Anna N A Tosteson, Heidi D Nelson, Margaret S Pepe, Kimberly H Allison, Stuart J Schnitt, Frances P O’Malley, Donald L Weaver, “Diagnostic Concordance among Pathologists Interpreting Breast Biopsy Specimens,” JAMA, 2015. doi: 0.1001/jama.2015.1405 - PMC - PubMed
    1. Adlung Lorenz, Cohen Yotam, Mor Uria, Elinav Eran. Machine learning in clinical decision making. Med. 2021;2(6):642–665. doi: 10.1016/j.medj.2021.04.006. - DOI - PubMed

Publication types