An Improved Conditional Wasserstein GAN With Gradient Penalty for Gene Expression Profiling Data Augmentation Based on Data Segmentation and Depth Feature Constraint
- PMID: 40811341
- DOI: 10.1109/TCBBIO.2025.3560097
An Improved Conditional Wasserstein GAN With Gradient Penalty for Gene Expression Profiling Data Augmentation Based on Data Segmentation and Depth Feature Constraint
Abstract
In practical medical diagnosis, small sample sizes in gene expression profiling data can lead to overfitting. Addressing this, we leverage the potential of the Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (CWGAN-GP) to amplify data volumes. However, it lacks control over the locations of generated samples and struggles to get a better balance between discriminators and generators during training. To overcome these hurdles, we propose the Improved CWGAN-GP, implementing two critical improvements. The first involves the adoption of a data segmentation strategy based on sample influence scores. By calculating the influence score for each sample, we prioritize samples at decision boundaries and outside the distributions as the training set, thus yielding more explicit decision boundaries. The second enhancement is that a depth feature constraint based on the Pearson correlation coefficient is proposed. Here, an encoder extracts the deep features, applying a constraint between the noise and deep features guided by the Pearson correlation coefficient. This strategy navigates the model closer to a Nash equilibrium. Empirical evaluations conducted on six publicly available gene expression profiling datasets validate our approach, demonstrating that it not only generates higher quality samples but also showcases superior stability compared to existing methods.