Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction
- PMID: 34604866
- PMCID: PMC8482531
- DOI: 10.1145/3459930.3469512
Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction
Abstract
Working with electronic health records (EHRs) is known to be challenging due to several reasons. These reasons include not having: 1) similar lengths (per visit), 2) the same number of observations (per patient), and 3) complete entries in the available records. These issues hinder the performance of the predictive models created using EHRs. In this paper, we approach these issues by presenting a model for the combined task of imputing and predicting values for the irregularly observed and varying length EHR data with missing entries. Our proposed model (dubbed as Bi-GAN) uses a bidirectional recurrent network in a generative adversarial setting. In this architecture, the generator is a bidirectional recurrent network that receives the EHR data and imputes the existing missing values. The discriminator attempts to discriminate between the actual and the imputed values generated by the generator. Using the input data in its entirety, Bi-GAN learns how to impute missing elements in-between (imputation) or outside of the input time steps (prediction). Our method has three advantages to the state-of-the-art methods in the field: (a) one single model performs both the imputation and prediction tasks; (b) the model can perform predictions using time-series of varying length with missing data; (c) it does not require to know the observation and prediction time window during training and can be used for the predictions with different observation and prediction window lengths, for short- and long-term predictions. We evaluate our model on two large EHR datasets to impute and predict body mass index (BMI) values and show its superior performance in both settings.
Keywords: Adversarial Training; Electronic Health Record; Recurrent Neural Network.
Figures




Similar articles
-
Flexible-Window Predictions on Electronic Health Records.Proc AAAI Conf Artif Intell. 2022 Feb-Mar;36(11):12510-12516. doi: 10.1609/aaai.v36i11.21520. Epub 2022 Jun 28. Proc AAAI Conf Artif Intell. 2022. PMID: 36312212 Free PMC article.
-
Multi-Task Deep Neural Networks for Irregularly Sampled Multivariate Clinical Time Series.Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:135-140. doi: 10.1109/ichi61247.2024.00025. Epub 2024 Aug 22. Proc (IEEE Int Conf Healthc Inform). 2024. PMID: 39726987 Free PMC article.
-
A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks.Comput Biol Med. 2024 Jan;168:107687. doi: 10.1016/j.compbiomed.2023.107687. Epub 2023 Nov 14. Comput Biol Med. 2024. PMID: 38007974
-
Generative adversarial networks for biomedical time series forecasting and imputation.J Biomed Inform. 2022 May;129:104058. doi: 10.1016/j.jbi.2022.104058. Epub 2022 Mar 25. J Biomed Inform. 2022. PMID: 35346855 Review.
-
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024. Health Data Sci. 2024. PMID: 39635227 Free PMC article. Review.
Cited by
-
Multisource Heterogeneous Sensor Processing Meets Distribution Networks: Brief Review and Potential Directions.Sensors (Basel). 2025 Jul 3;25(13):4146. doi: 10.3390/s25134146. Sensors (Basel). 2025. PMID: 40648401 Free PMC article. Review.
-
An Extensive Data Processing Pipeline for MIMIC-IV.Proc Mach Learn Res. 2022 Nov;193:311-325. Proc Mach Learn Res. 2022. PMID: 36686986 Free PMC article.
-
Bidirectional f-Divergence-Based Deep Generative Method for Imputing Missing Values in Time-Series Data.Stats (Basel). 2025 Mar;8(1):7. doi: 10.3390/stats8010007. Epub 2025 Jan 14. Stats (Basel). 2025. PMID: 39911165 Free PMC article.
-
Generative AI Models in Time-Varying Biomedical Data: Scoping Review.J Med Internet Res. 2025 Mar 10;27:e59792. doi: 10.2196/59792. J Med Internet Res. 2025. PMID: 40063929 Free PMC article.
-
An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation.Proc Mach Learn Res. 2024 Dec;259:308-324. Proc Mach Learn Res. 2024. PMID: 40051575 Free PMC article.
References
Grants and funding
LinkOut - more resources
Full Text Sources