FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization
- PMID: 40868340
- PMCID: PMC12383576
- DOI: 10.3390/bioengineering12080827
FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization
Abstract
In recent years, data augmentation techniques have become the predominant approach for addressing highly imbalanced classification problems in machine learning. Algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Network (CTGAN) have proven effective in synthesizing minority class samples. However, these methods often introduce distributional bias and noise, potentially leading to model overfitting, reduced predictive performance, increased computational costs, and elevated cybersecurity risks. To overcome these limitations, we propose a novel architecture, FADEL, which integrates feature-type awareness with a supervised discretization strategy. FADEL introduces a unique feature augmentation ensemble framework that preserves the original data distribution by concurrently processing continuous and discretized features. It dynamically routes these feature sets to their most compatible base models, thereby improving minority class recognition without the need for data-level balancing or augmentation techniques. Experimental results demonstrate that FADEL, solely leveraging feature augmentation without any data augmentation, achieves a recall of 90.8% and a G-mean of 94.5% on the internal test set from Kaohsiung Chang Gung Memorial Hospital in Taiwan. On the external validation set from Kaohsiung Medical University Chung-Ho Memorial Hospital, it maintains a recall of 91.9% and a G-mean of 86.7%. These results outperform conventional ensemble methods trained on CTGAN-balanced datasets, confirming the superior stability, computational efficiency, and cross-institutional generalizability of the FADEL architecture. Altogether, FADEL uses feature augmentation to offer a robust and practical solution to extreme class imbalance, outperforming mainstream data augmentation-based approaches.
Keywords: data augmentation; ensemble learning; feature augmentation; feature discretization; imbalance class classification.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures
References
-
- He H., Garcia E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009;21:1263–1284. doi: 10.1109/tkde.2008.239. - DOI
-
- Galar M., Fernandez A., Barrenechea E., Bustince H., Herrera F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2012;42:463–484. doi: 10.1109/TSMCC.2011.2161285. - DOI
-
- Mathew R.M., Gunasundari R. A Review on Handling Multiclass Imbalanced Data Classification in Education Domain; Proceedings of the 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE); Greater Noida, India. 4–5 March 2021; pp. 752–755. - DOI
-
- Su Q., Hamed H.N.A., Isa M.A., Hao X., Dai X. A GAN-Based Data Augmentation Method for Imbalanced Multi-Class Skin Lesion Classification. IEEE Access. 2024;12:16498–16513. doi: 10.1109/ACCESS.2024.3360215. - DOI
-
- Edward J., Rosli M.M., Seman A. A New Multi-Class Rebalancing Framework for Imbalance Medical Data. IEEE Access. 2023;11:92857–92874. doi: 10.1109/ACCESS.2023.3309732. - DOI
LinkOut - more resources
Full Text Sources
