Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 30;12(8):827.
doi: 10.3390/bioengineering12080827.

FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization

Affiliations

FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization

Chuan-Sheng Hung et al. Bioengineering (Basel). .

Abstract

In recent years, data augmentation techniques have become the predominant approach for addressing highly imbalanced classification problems in machine learning. Algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Network (CTGAN) have proven effective in synthesizing minority class samples. However, these methods often introduce distributional bias and noise, potentially leading to model overfitting, reduced predictive performance, increased computational costs, and elevated cybersecurity risks. To overcome these limitations, we propose a novel architecture, FADEL, which integrates feature-type awareness with a supervised discretization strategy. FADEL introduces a unique feature augmentation ensemble framework that preserves the original data distribution by concurrently processing continuous and discretized features. It dynamically routes these feature sets to their most compatible base models, thereby improving minority class recognition without the need for data-level balancing or augmentation techniques. Experimental results demonstrate that FADEL, solely leveraging feature augmentation without any data augmentation, achieves a recall of 90.8% and a G-mean of 94.5% on the internal test set from Kaohsiung Chang Gung Memorial Hospital in Taiwan. On the external validation set from Kaohsiung Medical University Chung-Ho Memorial Hospital, it maintains a recall of 91.9% and a G-mean of 86.7%. These results outperform conventional ensemble methods trained on CTGAN-balanced datasets, confirming the superior stability, computational efficiency, and cross-institutional generalizability of the FADEL architecture. Altogether, FADEL uses feature augmentation to offer a robust and practical solution to extreme class imbalance, outperforming mainstream data augmentation-based approaches.

Keywords: data augmentation; ensemble learning; feature augmentation; feature discretization; imbalance class classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
The overall workflow of the FADEL framework.
Figure 2
Figure 2
Comparison of model performance on the Kaohsiung Chang Gung Memorial Hospital Kawasaki Disease testing set: (a) Precision–Recall (PPV–Recall) curves and (b) Receiver Operating Characteristic (ROC) curves for all evaluated models. The dashed diagonal line in the ROC plot represents the performance of a random classifier.
Figure 3
Figure 3
Comparison of model performance on the Kaohsiung Medical University Chung-Ho Memorial Hospital Kawasaki Disease testing set: (a) Precision–Recall (PPV–Recall) curves and (b) Receiver Operating Characteristic (ROC) curves for all evaluated models. The dashed diagonal line in the ROC plot represents the performance of a random classifier.
Figure 4
Figure 4
Comparative analysis of multiple models with the FADEL framework and data augmentation strategies on Kawasaki Disease datasets from Chang Gung Memorial Hospital (CGMH) and Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUH). (a) Line plot comparison of FADEL and CTGAN-augmented models across key performance metrics (recall, F1-score, G-mean, and specificity), highlighting FADEL superiority and consistent performance in CGMH and KMUH datasets. (b) Comprehensive heatmap comparison of model performance across CGMH and KMUH datasets, with and without CTGAN-based data augmentation. Metrics include recall, F1-score, G-mean, sensitivity, and specificity across 12 model variants.

References

    1. He H., Garcia E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009;21:1263–1284. doi: 10.1109/tkde.2008.239. - DOI
    1. Galar M., Fernandez A., Barrenechea E., Bustince H., Herrera F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2012;42:463–484. doi: 10.1109/TSMCC.2011.2161285. - DOI
    1. Mathew R.M., Gunasundari R. A Review on Handling Multiclass Imbalanced Data Classification in Education Domain; Proceedings of the 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE); Greater Noida, India. 4–5 March 2021; pp. 752–755. - DOI
    1. Su Q., Hamed H.N.A., Isa M.A., Hao X., Dai X. A GAN-Based Data Augmentation Method for Imbalanced Multi-Class Skin Lesion Classification. IEEE Access. 2024;12:16498–16513. doi: 10.1109/ACCESS.2024.3360215. - DOI
    1. Edward J., Rosli M.M., Seman A. A New Multi-Class Rebalancing Framework for Imbalance Medical Data. IEEE Access. 2023;11:92857–92874. doi: 10.1109/ACCESS.2023.3309732. - DOI

LinkOut - more resources