Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review

Cao Xiao¹, Edward Choi², Jimeng Sun²

Affiliations

¹ AI for Healthcare, IBM Research, Cambridge, Massachusetts, USA.
² School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.

PMID: 29893864
PMCID: PMC6188527
DOI: 10.1093/jamia/ocy068

Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review

Cao Xiao et al. J Am Med Inform Assoc. 2018.

. 2018 Oct 1;25(10):1419-1428.

doi: 10.1093/jamia/ocy068.

Authors

Cao Xiao¹, Edward Choi², Jimeng Sun²

Affiliations

¹ AI for Healthcare, IBM Research, Cambridge, Massachusetts, USA.
² School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.

PMID: 29893864
PMCID: PMC6188527
DOI: 10.1093/jamia/ocy068

Abstract

Objective: To conduct a systematic review of deep learning models for electronic health record (EHR) data, and illustrate various deep learning architectures for analyzing different data sources and their target applications. We also highlight ongoing research and identify open challenges in building deep learning models of EHRs.

Design/method: We searched PubMed and Google Scholar for papers on deep learning studies using EHR data published between January 1, 2010, and January 31, 2018. We summarize them according to these axes: types of analytics tasks, types of deep learning model architectures, special challenges arising from health data and tasks and their potential solutions, as well as evaluation strategies.

Results: We surveyed and analyzed multiple aspects of the 98 articles we found and identified the following analytics tasks: disease detection/classification, sequential prediction of clinical events, concept embedding, data augmentation, and EHR data privacy. We then studied how deep architectures were applied to these tasks. We also discussed some special challenges arising from modeling EHR data and reviewed a few popular approaches. Finally, we summarized how performance evaluations were conducted for each task.

Discussion: Despite the early success in using deep learning for health analytics applications, there still exist a number of issues to be addressed. We discuss them in detail including data and label availability, the interpretability and transparency of the model, and ease of deployment.

PubMed Disclaimer

Figures

**Figure 1.**
Illustration of literature search and selection procedure.

**Figure 2.**
Transform longitudinal EHR data into input vectors (top left), which could support different analytics tasks described in the survey (top right). The underlying deep learning models are visually described at the bottom (a): Feedforward neural networks use multiple layers of fully connected neural networks and non-linear activations (eg., sigmoid or rectified linear unit). (b): Recurrent neural networks can process variable-length input sequence using its recurrent connection. (c): Restricted Boltzmann Machines are bipartite neural networks that consist of binary stochastic nodes. They can capture the latent representation of the input data by learning their generative probability. (d): Generative adversarial networks can generate realistic synthetic samples by training the generator and the discriminator in an adversarial game. (e): Convolutional neural networks capture local features of the input data, and stack those features up via a sequence of convolution to derive global features. (f): Word2vec exploits the co-occurrence information of discrete concepts (eg., words in text, codes in EHR data) to derive concept representations. (g): Denoising autoencoders (AE) try to reconstruct original input from its corrupted version, thus learning robust representations of the input data.

See this image and copyright information in PMC

References

1. Richesson RL, Sun J, Pathak J, Kho AN, Denny JC.. Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods. Artif Intell Med 2016; 71: 57–61. - PMC - PubMed
1. LeCun Y, Bengio Y, Hinton G.. Deep learning. Nature 2015; 5217553: 436–44. - PubMed
1. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A.. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016; 31622: 2402–10. - PubMed
1. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 5427639: 115–8. - PMC - PubMed
1. Leung MKK, Xiong HY, Lee LJ, Frey BJ.. Deep learning of the tissue-regulated splicing code. Bioinformatics 2014; 3012: i121–9. - PMC - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review

Affiliations

Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources