Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 17;19(Suppl 8):259.
doi: 10.1186/s12911-019-0985-7.

Representation learning for clinical time series prediction tasks in electronic health records

Affiliations

Representation learning for clinical time series prediction tasks in electronic health records

Tong Ruan et al. BMC Med Inform Decis Mak. .

Abstract

Background: Electronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful.

Method: In this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector.

Results: Based on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the "Deep Feature" represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations.

Conclusion: We propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.

Keywords: Electronic health records; Mortality prediction; Recurrent neural network; Representation learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
An overview of the proposed representation learning approach to generate patient vectors and further applications
Fig. 2
Fig. 2
Three different forms of the representation of patients. Here, patient may have various inpatient times (e.g., x,y,z). The tensor representation of each patient consists of multiple multi-hot vectors of N-dimensions (i.e., N=1309). The statistic-based representation is derived by operating summary statistics, and it gets a vector with N-dimensions. Typically, distributed representation is a better representation with D-dimensions (i.e., D=300), where D is much lower than N. a Tensor representation of patients. b Statistic-based representation of patients. c Distributed representation of patients
Fig. 3
Fig. 3
The architecture of our proposed RNNDAE model. Multi-hot vectors (xt) with time series are added by a Gaussian noise and then encoded by a GRUencoder model into the patient vector (c). Given the patient vector, another GRUdecoder model is used to decode in order to make the input (xt) and the output (yt) are consistent as much as possible
Fig. 4
Fig. 4
A diagram of t-SNE technique for dimensionality reduction. With the help of t-SNE, some D-dimensional data points are projected into 2-dimensional space. Specially, the red points indicate the patients who finally die and the blue ones represent those patients who do not die
Fig. 5
Fig. 5
Results of patient similarity analysis based on “Deep Feature”
Fig. 6
Fig. 6
Comparative results of different sampling strategies
Fig. 7
Fig. 7
Comparative results of different binary classifiers
Fig. 8
Fig. 8
Comparative results of patient representation vectors with different dimensions
Fig. 9
Fig. 9
The effect of different training data sizes

Similar articles

Cited by

References

    1. Wang Q, Qiu J, Zhou Y, Ruan T, Gao D, Gao J. Automatic severity classification of coronary artery disease via recurrent capsule network. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1587–94. 10.1109/bibm.2018.8621136.
    1. Allyn J, Allou N, Augustin P, Philip I, Martinet O, Belghiti M, Provenchere S, Montravers P, Ferdynus C. A comparison of a machine learning model with euroscore II in predicting mortality after elective cardiac surgery: a decision curve analysis. PLoS ONE. 2017;12(1):0169772. doi: 10.1371/journal.pone.0169772. - DOI - PMC - PubMed
    1. Sharafoddini Anis, Dubin Joel A, Lee Joon. Patient Similarity in Prediction Models Based on Health Data: A Scoping Review. JMIR Medical Informatics. 2017;5(1):e7. doi: 10.2196/medinform.6730. - DOI - PMC - PubMed
    1. Cheng Y, Wang F, Zhang P, Hu J. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM: 2016. p. 432–40. 10.1137/1.9781611974348.49.
    1. Zhang J, Wang Q, Zhang Z, Zhou Y, Ye Q, Zhang H, Qiu J, He P. An effective standardization method for the lab indicators in regional medical health platform using n-grams and stacking. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE: 2018. p. 1602–9. 10.1109/bibm.2018.8621274.

Publication types

LinkOut - more resources