Generative AI mitigates representation bias and improves model fairness through synthetic health data
- PMID: 40388536
- PMCID: PMC12112403
- DOI: 10.1371/journal.pcbi.1013080
Generative AI mitigates representation bias and improves model fairness through synthetic health data
Abstract
Representation bias in health data can lead to unfair decisions and compromise the generalisability of research findings. As a consequence, underrepresented subpopulations, such as those from specific ethnic backgrounds or genders, do not benefit equally from clinical discoveries. Several approaches have been developed to mitigate representation bias, ranging from simple resampling methods, such as SMOTE, to recent approaches based on generative adversarial networks (GAN). However, generating high-dimensional time-series synthetic health data remains a significant challenge. In response, we devised a novel architecture (CA-GAN) that synthesises authentic, high-dimensional time series data. CA-GAN outperforms state-of-the-art methods in a qualitative and a quantitative evaluation while avoiding mode collapse, a serious GAN failure. We perform evaluation using 7535 patients with hypotension and sepsis from two diverse, real-world clinical datasets. We show that synthetic data generated by our CA-GAN improves model fairness in Black patients as well as female patients when evaluated separately for each subpopulation. Furthermore, CA-GAN generates authentic data of the minority class while faithfully maintaining the original distribution of data, resulting in improved performance in a downstream predictive task.
Copyright: © 2025 Marchesi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures





Similar articles
-
Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images.Hum Reprod. 2024 Jun 3;39(6):1197-1207. doi: 10.1093/humrep/deae064. Hum Reprod. 2024. PMID: 38600621 Free PMC article.
-
Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets.Cancers (Basel). 2024 Dec 2;16(23):4046. doi: 10.3390/cancers16234046. Cancers (Basel). 2024. PMID: 39682233 Free PMC article.
-
Improving Multi-Agent Generative Adversarial Nets with Variational Latent Representation.Entropy (Basel). 2020 Sep 21;22(9):1055. doi: 10.3390/e22091055. Entropy (Basel). 2020. PMID: 33286824 Free PMC article.
-
A scoping review of fair machine learning techniques when using real-world data.J Biomed Inform. 2024 Mar;151:104622. doi: 10.1016/j.jbi.2024.104622. Epub 2024 Mar 6. J Biomed Inform. 2024. PMID: 38452862 Free PMC article.
-
Ophthalmic Image Synthesis and Analysis with Generative Adversarial Network Artificial Intelligence.J Imaging Inform Med. 2025 May 20. doi: 10.1007/s10278-025-01519-1. Online ahead of print. J Imaging Inform Med. 2025. PMID: 40394320 Review.
References
-
- Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. jair. 2002;16:321–57. doi: 10.1613/jair.953 - DOI
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials