Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction

Mehak Gupta¹, H Timothy Bunnell², Thao-Ly T Phan², Rahmatollah Beheshti¹

Affiliations

PMID: 34604866
PMCID: PMC8482531
DOI: 10.1145/3459930.3469512

Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction

Mehak Gupta et al. ACM BCB. 2021 Aug.

. 2021 Aug:2021:7.

doi: 10.1145/3459930.3469512.

Authors

Mehak Gupta¹, H Timothy Bunnell², Thao-Ly T Phan², Rahmatollah Beheshti¹

Affiliations

¹ University of Delaware Newark, Delaware, USA.
² Nemours Children's Health System Willmington, Delaware, USA.

PMID: 34604866
PMCID: PMC8482531
DOI: 10.1145/3459930.3469512

Abstract

Working with electronic health records (EHRs) is known to be challenging due to several reasons. These reasons include not having: 1) similar lengths (per visit), 2) the same number of observations (per patient), and 3) complete entries in the available records. These issues hinder the performance of the predictive models created using EHRs. In this paper, we approach these issues by presenting a model for the combined task of imputing and predicting values for the irregularly observed and varying length EHR data with missing entries. Our proposed model (dubbed as Bi-GAN) uses a bidirectional recurrent network in a generative adversarial setting. In this architecture, the generator is a bidirectional recurrent network that receives the EHR data and imputes the existing missing values. The discriminator attempts to discriminate between the actual and the imputed values generated by the generator. Using the input data in its entirety, Bi-GAN learns how to impute missing elements in-between (imputation) or outside of the input time steps (prediction). Our method has three advantages to the state-of-the-art methods in the field: (a) one single model performs both the imputation and prediction tasks; (b) the model can perform predictions using time-series of varying length with missing data; (c) it does not require to know the observation and prediction time window during training and can be used for the predictions with different observation and prediction window lengths, for short- and long-term predictions. We evaluate our model on two large EHR datasets to impute and predict body mass index (BMI) values and show its superior performance in both settings.

Keywords: Adversarial Training; Electronic Health Record; Recurrent Neural Network.

PubMed Disclaimer

Figures

**Figure 1:**
Bi-GAN architecture overview. The blue arrows show the loss calculations and red dashed arrows show the loss back-propagation. Data vector X along with target vector x (shown in blue shaded row) and its forward and backward decay vectors are given as input to the the the bidirectional RNN. It generates values in both forward and backward direction (we do not show forward and backward values separately for image clarity). The regression layer generates the final generated values $\tilde{x}$ . Mask is used to obtain the imputed vector $\bar{x}$ , which contains the generated values $\tilde{x}$ for the missing values (shown in red) and the observed values x (shown in black), when the values are not missing. Dashed arrows show that $\bar{x}$ values are given as input to the next timestamp. The discriminator takes imputed vector $\bar{x}$ to predict the probability of whether the values are fake ( $\tilde{x}$ ) or real (x). It uses the mask vector as the ground truth to calculate its loss.

**Figure 2:**
Imputation and Prediction settings for Bi-GAN.

**Figure 3:**
Imputation performance comparison between Bi-GAN, BRITS-I and MRNN with different missing rates - 10%, 20%, 30%, 40% and 50%. The graph shows the MAE with 95% CI (shown by error bars).

**Figure 4:**
Prediction performance comparison between Bi-GAN, BRITS-I and MRNN with different observation windows (2, 3, 4, 5 years) and prediction windows (8, 7, 6, 5 years), respectively. The graph shows the MAEs with 95% CI (shown by error bars).

See this image and copyright information in PMC

References

1. Adab Peymane, Pallan Miranda, and Whincup Peter H. 2018. Is BMI the best measure of obesity? - PubMed
1. Azur Melissa J, Stuart Elizabeth A, Frangakis Constantine, and Leaf Philip J. 2011. Multiple imputation by chained equations: what is it and how does it work? International journal of methods in psychiatric research 20, 1 (2011), 40–49. - PMC - PubMed
1. Batista Gustavo E. A. P. A. and Monard Maria Carolina. 2003. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17, 5-6 (2003), 519–533. 10.1080/713827181 arXiv: 10.1080/713827181 - DOI
1. Beheshti Rahmatollah, Jalalpour Mehdi, and Glass Thomas A.. 2017. Comparing methods of targeting obesity interventions in populations: An agent-based simulation. SSM - Population Health 3 (2017), 211–218. 10.1016/j.ssmph.2017.01.006 - DOI - PMC - PubMed
1. Bray George A. 2004. Medical consequences of obesity. The Journal of Clinical Endocrinology & Metabolism 89, 6 (2004), 2583–2589. - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction

Affiliations

Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources