Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

Benjamin Shickel, Patrick James Tighe, Azra Bihorac, Parisa Rashidi

PMID: 29989977
PMCID: PMC6043423
DOI: 10.1109/JBHI.2017.2767063

Review

Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

Benjamin Shickel et al. IEEE J Biomed Health Inform. 2018 Sep.

. 2018 Sep;22(5):1589-1604.

doi: 10.1109/JBHI.2017.2767063. Epub 2017 Oct 27.

Authors

Benjamin Shickel, Patrick James Tighe, Azra Bihorac, Parisa Rashidi

PMID: 29989977
PMCID: PMC6043423
DOI: 10.1109/JBHI.2017.2767063

Abstract

The past decade has seen an explosion in the amount of digital information stored in electronic health records (EHRs). While primarily designed for archiving patient information and performing administrative healthcare tasks like billing, many researchers have found secondary use of these records for various clinical informatics applications. Over the same period, the machine learning community has seen widespread advances in the field of deep learning. In this review, we survey the current research on applying deep learning to clinical tasks based on EHR data, where we find a variety of deep learning techniques and frameworks being applied to several types of clinical applications including information extraction, representation learning, outcome prediction, phenotyping, and deidentification. We identify several limitations of current research involving topics such as model interpretability, data heterogeneity, and lack of universal benchmarks. We conclude by summarizing the state of the field and identifying avenues of future deep EHR research.

PubMed Disclaimer

Figures

**Fig. 1**
Trends in the number of Google Scholar publications relating to deep EHR through August 2017. The top distribution shows overall results for “deep learning” and “electronic health records”. The bottom two distributions show these same terms in conjunction with a variety of specific application areas and technical methods. Large yearly jumps are seen for most terms beginning in 2015.

**Fig. 2**
Neural network with 1 input layer, 1 output layer, and 2 hidden layers.

**Fig. 3**
The most common deep learning architectures for analyzing EHR data. Architectures differ in terms of their node types and the connection structure (e.g. fully connected versus locally connected). Below each model type is a list of selected references implementing the architecture for EHR applications. Icons based on the work of van Veen [30].

**Fig. 4**
Example of a convolutional neural network (CNN) for classifying images. This particular model includes two convolutional layers, each followed by a pooling/subsampling layer. The output from the second pooling layer is fed to a fully connected layer and a final output layer. [31]

**Fig. 5**
Symbolic representation of a RNN (left) with equivalent expanded representation (right) for an example input sequence of length three, three hidden units, and a single output. Each input time step is combined with the current hidden state of the RNN, which itself depends on the previous hidden state, demonstrating the memory effect of RNNs.

**Fig. 6**
Example of a stacked autoencoder with two independently-trained hidden layers. In the first layer, x̃ is the reconstruction of input x, and z is lower dimensional representation (i.e., the encoding) of input x. Once the first hidden layer is trained, the embeddings z are used as input to a second autoencoder, demonstrating how autoencoders can be stacked. [33]

**Fig. 7**
EHR Information Extraction (IE) and example tasks.

**Fig. 8**
Illustration of how autoencoders can be used to transform extremely sparse patient vectors into a more compact representation. Since medical codes are represented as binary categorical features, raw patient vectors can have dimensions in the thousands. Training an autoencoder on these vectors produces an encoding function to transform any given vector into it's distributed and dimensionality-reduced representation.

**Fig. 9**
Beaulieu-Jones and Greene's [49] autoencoder-based phenotype stratification for case (1) vs. control (0) diagnoses, illustrated with t-SNE. (A) shows clustering based on raw clinical descriptors, where there is little separable structure. (B-F) show the resulting clusters following 0-10,000 training epochs of the single-layer autoencoder. As the autoencoder is trained, there are clear boundaries between the two labels, suggesting the unsupervised autoencoder discovers latent structure in the raw data without any human input.

**Fig. 10**
Example of the positive effect of sparsity constraints on model interpretability. Shown are the first hidden layer weights from Lasko et al.'s [50] autoencoder framework for phenotyping uric acid sequences, which in effect form functional element detectors.

See this image and copyright information in PMC

References

1. Birkhead GS, Klompas M, Shah NR. Uses of Electronic Health Records for Public Health Surveillance to Advance Public Health. Annu Rev Public Health. 2015;36:345–59. - PubMed
1. T. O. of the National Coordinator for Health Information Technology. Adoption of Electronic Health Record Systems among U.S. Non-Federal Acute Care Hospitals: 2008-2015, year = 2016. URL = https://dashboard.healthit.gov/evaluations/data-briefs/non-federal-acute....
1. J E, N N. Electronic Health Record Adoption and Use among Office-based Physicians in the U.S., by State: 2015 National Electronic Health Records Survey. The Office of the National Coordinator for Health Information Technology, Tech Rep. 2016
1. Botsis T, Hartvigsen G, Chen F, Weng C. Secondary Use of ehr: Data Quality Issues and Informatics Opportunities. AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science. 2010;2010:1–5. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/21347133{%}5Cnhttp://www.pubmedcentra.... - PMC - PubMed
1. Jensen PB, Jensen LJ, Brunak S. Translational genetics: Mining electronic health records: towards better research applications and clinical care. Nature Reviews - Genetics. 2012;13:395–405. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

Authors

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous