A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks
- PMID: 38007974
- DOI: 10.1016/j.compbiomed.2023.107687
A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks
Abstract
Electronic health records (EHR), present challenges of incomplete and imbalanced data in clinical predictions. Previous studies addressed these two issues with two-step separately, which caused the decrease in the performance of prediction tasks. In this paper, we propose a unified framework to simultaneously addresses the challenges of incomplete and imbalanced data in EHR. Based on the framework, we develop a model called Missing Value Imputation and Imbalanced Learning Generative Adversarial Network (MVIIL-GAN). We use MVIIL-GAN to perform joint learning on the imputation process of high missing rate data and the conditional generation process of EHR data. The joint learning is achieved by introducing two discriminators to distinguish the fake data from the generated data at sample-level and variable-level. MVIIL-GAN integrate the missing values imputation and data generation in one step, improving the consistency of parameter optimization and the performance of prediction tasks. We evaluate our framework using the public dataset MIMIC-IV with high missing rates data and imbalanced data. Experimental results show that MVIIL-GAN outperforms existing methods in prediction performance. The implementation of MVIIL-GAN can be found at https://github.com/Peroxidess/MVIIL-GAN.
Keywords: Electronic health records; Generative adversarial networks; Imbalanced learning; Missing values imputation.
Copyright © 2023. Published by Elsevier Ltd.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024. Health Data Sci. 2024. PMID: 39635227 Free PMC article. Review.
-
A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets.Comput Biol Med. 2023 Sep;163:107188. doi: 10.1016/j.compbiomed.2023.107188. Epub 2023 Jun 22. Comput Biol Med. 2023. PMID: 37393785
-
Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems.Proc Int Conf Mach Learn Appl. 2021 Dec;2021:791-798. doi: 10.1109/icmla52953.2021.00131. Proc Int Conf Mach Learn Appl. 2021. PMID: 35169788 Free PMC article.
-
Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction.ACM BCB. 2021 Aug;2021:7. doi: 10.1145/3459930.3469512. ACM BCB. 2021. PMID: 34604866 Free PMC article.
-
DeepMicroGen: a generative adversarial network-based method for longitudinal microbiome data imputation.Bioinformatics. 2023 May 4;39(5):btad286. doi: 10.1093/bioinformatics/btad286. Bioinformatics. 2023. PMID: 37099704 Free PMC article.
Cited by
-
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024. Health Data Sci. 2024. PMID: 39635227 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources