Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging
- PMID: 32965624
- DOI: 10.1007/s11548-020-02260-6
Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging
Abstract
Purpose: Machine learning (ML) algorithms are well known to exhibit variations in prediction accuracy when provided with imbalanced training sets typically seen in medical imaging (MI) due to the imbalanced ratio of pathological and normal cases. This paper presents a thorough investigation of the effects of class imbalance and methods for mitigating class imbalance in ML algorithms applied to MI.
Methods: We first selected five classes from the Image Retrieval in Medical Applications (IRMA) dataset, performed multiclass classification using the random forest model (RFM), and then performed binary classification using convolutional neural network (CNN) on a chest X-ray dataset. An imbalanced class was created in the training set by varying the number of images in that class. Methods tested to mitigate class imbalance included oversampling, undersampling, and changing class weights of the RFM. Model performance was assessed by overall classification accuracy, overall F1 score, and specificity, recall, and precision of the imbalanced class.
Results: A close-to-balanced training set resulted in the best model performance, and a large imbalance with overrepresentation was more detrimental to model performance than underrepresentation. Oversampling and undersampling methods were both effective in mitigating class imbalance, and efficacy of oversampling techniques was class specific.
Conclusion: This study systematically demonstrates the effect of class imbalance on two public X-ray datasets on RFM and CNN, making these findings widely applicable as a reference. Furthermore, the methods employed here can guide researchers in assessing and addressing the effects of class imbalance, while considering the data-specific characteristics to optimize imbalance mitigating methods.
Keywords: Class imbalance; Machine learning; Medical imaging; Radiology; X-ray.
Similar articles
-
A systematic study of the class imbalance problem in convolutional neural networks.Neural Netw. 2018 Oct;106:249-259. doi: 10.1016/j.neunet.2018.07.011. Epub 2018 Jul 29. Neural Netw. 2018. PMID: 30092410
-
Interaction effect between data discretization and data resampling for class-imbalanced medical datasets.Technol Health Care. 2025 Mar;33(2):1000-1013. doi: 10.1177/09287329241295874. Epub 2024 Nov 25. Technol Health Care. 2025. PMID: 40105161
-
Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning.J Med Imaging (Bellingham). 2023 Sep;10(5):051809. doi: 10.1117/1.JMI.10.5.051809. Epub 2023 Jun 23. J Med Imaging (Bellingham). 2023. PMID: 37361550 Free PMC article.
-
Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction.Molecules. 2023 Feb 9;28(4):1663. doi: 10.3390/molecules28041663. Molecules. 2023. PMID: 36838652 Free PMC article. Review.
-
Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance.Eur Radiol. 2024 Dec;34(12):7895-7903. doi: 10.1007/s00330-024-10834-0. Epub 2024 Jun 11. Eur Radiol. 2024. PMID: 38861161 Review.
Cited by
-
A Pneumonia Diagnosis Scheme Based on Hybrid Features Extracted from Chest Radiographs Using an Ensemble Learning Algorithm.J Healthc Eng. 2021 Feb 25;2021:8862089. doi: 10.1155/2021/8862089. eCollection 2021. J Healthc Eng. 2021. PMID: 33728035 Free PMC article.
-
Predicting and interpreting key features of refractory Mycoplasma pneumoniae pneumonia using multiple machine learning methods.Sci Rep. 2025 May 23;15(1):18029. doi: 10.1038/s41598-025-02962-4. Sci Rep. 2025. PMID: 40410245 Free PMC article.
-
Can Sequential Images from the Same Object Be Used for Training Machine Learning Models? A Case Study for Detecting Liver Disease by Ultrasound Radiomics.AI (Basel). 2022 Sep;3(3):739-750. doi: 10.3390/ai3030043. Epub 2022 Sep 1. AI (Basel). 2022. PMID: 36168560 Free PMC article.
-
A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer.Diagnostics (Basel). 2022 Dec 26;13(1):67. doi: 10.3390/diagnostics13010067. Diagnostics (Basel). 2022. PMID: 36611358 Free PMC article.
-
Visual assessment of interactions among resuscitation activity factors in out-of-hospital cardiopulmonary arrest using a machine learning model.PLoS One. 2022 Sep 6;17(9):e0273787. doi: 10.1371/journal.pone.0273787. eCollection 2022. PLoS One. 2022. PMID: 36067174 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources