Representation learning for clinical time series prediction tasks in electronic health records

doi:10.1186/s12911-019-0985-7

. 2019 Dec 17;19(Suppl 8):259.

doi: 10.1186/s12911-019-0985-7.

Representation learning for clinical time series prediction tasks in electronic health records

Tong Ruan¹, Liqi Lei¹, Yangming Zhou², Jie Zhai¹, Le Zhang¹, Ping He³, Ju Gao⁴

Affiliations

¹ School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China.
² School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China. ymzhou@ecust.edu.cn.
³ Shanghai Hospital Development Center, 2 Kangding Road, Shanghai, 200000, China.
⁴ Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, 528 Zhangheng Road, Shanghai, 201203, China.

PMID: 31842854
PMCID: PMC6916209
DOI: 10.1186/s12911-019-0985-7

Representation learning for clinical time series prediction tasks in electronic health records

Tong Ruan et al. BMC Med Inform Decis Mak. 2019.

. 2019 Dec 17;19(Suppl 8):259.

doi: 10.1186/s12911-019-0985-7.

Authors

Tong Ruan¹, Liqi Lei¹, Yangming Zhou², Jie Zhai¹, Le Zhang¹, Ping He³, Ju Gao⁴

Affiliations

¹ School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China.
² School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China. ymzhou@ecust.edu.cn.
³ Shanghai Hospital Development Center, 2 Kangding Road, Shanghai, 200000, China.
⁴ Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, 528 Zhangheng Road, Shanghai, 201203, China.

PMID: 31842854
PMCID: PMC6916209
DOI: 10.1186/s12911-019-0985-7

Abstract

Background: Electronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful.

Method: In this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector.

Results: Based on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the "Deep Feature" represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations.

Conclusion: We propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.

Keywords: Electronic health records; Mortality prediction; Recurrent neural network; Representation learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
An overview of the proposed representation learning approach to generate patient vectors and further applications

**Fig. 2**
Three different forms of the representation of patients. Here, patient may have various inpatient times (e.g., *x,y*,z). The tensor representation of each patient consists of multiple multi-hot vectors of N-dimensions (i.e., N=1309). The statistic-based representation is derived by operating summary statistics, and it gets a vector with N-dimensions. Typically, distributed representation is a better representation with D-dimensions (i.e., D=300), where D is much lower than N. a Tensor representation of patients. b Statistic-based representation of patients. c Distributed representation of patients

**Fig. 3**
The architecture of our proposed RNNDAE model. Multi-hot vectors (x_t) with time series are added by a Gaussian noise and then encoded by a ***GRU***_encoder model into the patient vector (c). Given the patient vector, another ***GRU***_decoder model is used to decode in order to make the input (x_t) and the output (y_t) are consistent as much as possible

**Fig. 4**
A diagram of t-SNE technique for dimensionality reduction. With the help of t-SNE, some D-dimensional data points are projected into 2-dimensional space. Specially, the red points indicate the patients who finally die and the blue ones represent those patients who do not die

**Fig. 5**
Results of patient similarity analysis based on “Deep Feature”

**Fig. 6**
Comparative results of different sampling strategies

**Fig. 7**
Comparative results of different binary classifiers

**Fig. 8**
Comparative results of patient representation vectors with different dimensions

**Fig. 9**
The effect of different training data sizes

See this image and copyright information in PMC

Cited by

Semisupervised Calibration of Risk with Noisy Event Times (SCORNET) using electronic health record data.
Ahuja Y, Liang L, Zhou D, Huang S, Cai T. Ahuja Y, et al. Biostatistics. 2023 Jul 14;24(3):760-775. doi: 10.1093/biostatistics/kxac003. Biostatistics. 2023. PMID: 35166342 Free PMC article.
Clinical relevance of deep learning models in predicting the onset timing of cancer pain exacerbation.
Bang YH, Choi YH, Park M, Shin SY, Kim SJ. Bang YH, et al. Sci Rep. 2023 Jul 17;13(1):11501. doi: 10.1038/s41598-023-37742-5. Sci Rep. 2023. PMID: 37460584 Free PMC article.
Enhancing Patient Outcome Prediction Through Deep Learning With Sequential Diagnosis Codes From Structured Electronic Health Record Data: Systematic Review.
Hama T, Alsaleh MM, Allery F, Choi JW, Tomlinson C, Wu H, Lai A, Pontikos N, Thygesen JH. Hama T, et al. J Med Internet Res. 2025 Mar 18;27:e57358. doi: 10.2196/57358. J Med Internet Res. 2025. PMID: 40100249 Free PMC article.
Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study.
Huang Y, Wang N, Zhang Z, Liu H, Fei X, Wei L, Chen H. Huang Y, et al. JMIR Med Inform. 2021 Jul 23;9(7):e19905. doi: 10.2196/19905. JMIR Med Inform. 2021. PMID: 34297000 Free PMC article.
A semi-supervised adaptive Markov Gaussian embedding process (SAMGEP) for prediction of phenotype event times using the electronic health record.
Ahuja Y, Wen J, Hong C, Xia Z, Huang S, Cai T. Ahuja Y, et al. Sci Rep. 2022 Oct 22;12(1):17737. doi: 10.1038/s41598-022-22585-3. Sci Rep. 2022. PMID: 36273240 Free PMC article.

See all "Cited by" articles

References

1. Wang Q, Qiu J, Zhou Y, Ruan T, Gao D, Gao J. Automatic severity classification of coronary artery disease via recurrent capsule network. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1587–94. 10.1109/bibm.2018.8621136.
1. Allyn J, Allou N, Augustin P, Philip I, Martinet O, Belghiti M, Provenchere S, Montravers P, Ferdynus C. A comparison of a machine learning model with euroscore II in predicting mortality after elective cardiac surgery: a decision curve analysis. PLoS ONE. 2017;12(1):0169772. doi: 10.1371/journal.pone.0169772. - DOI - PMC - PubMed
1. Sharafoddini Anis, Dubin Joel A, Lee Joon. Patient Similarity in Prediction Models Based on Health Data: A Scoping Review. JMIR Medical Informatics. 2017;5(1):e7. doi: 10.2196/medinform.6730. - DOI - PMC - PubMed
1. Cheng Y, Wang F, Zhang P, Hu J. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM: 2016. p. 432–40. 10.1137/1.9781611974348.49.
1. Zhang J, Wang Q, Zhang Z, Zhou Y, Ye Q, Zhang H, Qiu J, He P. An effective standardization method for the lab indicators in regional medical health platform using n-grams and stacking. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1602–9. 10.1109/bibm.2018.8621274.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

[1] Wang Q, Qiu J, Zhou Y, Ruan T, Gao D, Gao J. Automatic severity classification of coronary artery disease via recurrent capsule network. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1587–94. 10.1109/bibm.2018.8621136.

[2] Wang Q, Qiu J, Zhou Y, Ruan T, Gao D, Gao J. Automatic severity classification of coronary artery disease via recurrent capsule network. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1587–94. 10.1109/bibm.2018.8621136.

[3] Allyn J, Allou N, Augustin P, Philip I, Martinet O, Belghiti M, Provenchere S, Montravers P, Ferdynus C. A comparison of a machine learning model with euroscore II in predicting mortality after elective cardiac surgery: a decision curve analysis. PLoS ONE. 2017;12(1):0169772. doi: 10.1371/journal.pone.0169772. - DOI - PMC - PubMed

[4] Allyn J, Allou N, Augustin P, Philip I, Martinet O, Belghiti M, Provenchere S, Montravers P, Ferdynus C. A comparison of a machine learning model with euroscore II in predicting mortality after elective cardiac surgery: a decision curve analysis. PLoS ONE. 2017;12(1):0169772. doi: 10.1371/journal.pone.0169772. - DOI - PMC - PubMed

[5] Sharafoddini Anis, Dubin Joel A, Lee Joon. Patient Similarity in Prediction Models Based on Health Data: A Scoping Review. JMIR Medical Informatics. 2017;5(1):e7. doi: 10.2196/medinform.6730. - DOI - PMC - PubMed

[6] Sharafoddini Anis, Dubin Joel A, Lee Joon. Patient Similarity in Prediction Models Based on Health Data: A Scoping Review. JMIR Medical Informatics. 2017;5(1):e7. doi: 10.2196/medinform.6730. - DOI - PMC - PubMed

[7] Cheng Y, Wang F, Zhang P, Hu J. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM: 2016. p. 432–40. 10.1137/1.9781611974348.49.

[8] Cheng Y, Wang F, Zhang P, Hu J. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM: 2016. p. 432–40. 10.1137/1.9781611974348.49.

[9] Zhang J, Wang Q, Zhang Z, Zhou Y, Ye Q, Zhang H, Qiu J, He P. An effective standardization method for the lab indicators in regional medical health platform using n-grams and stacking. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1602–9. 10.1109/bibm.2018.8621274.

[10] Zhang J, Wang Q, Zhang Z, Zhou Y, Ye Q, Zhang H, Qiu J, He P. An effective standardization method for the lab indicators in regional medical health platform using n-grams and stacking. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1602–9. 10.1109/bibm.2018.8621274.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Representation learning for clinical time series prediction tasks in electronic health records

Affiliations

Representation learning for clinical time series prediction tasks in electronic health records

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources