GRAM: Graph-based Attention Model for Healthcare Representation Learning

Edward Choi¹, Mohammad Taha Bahadori¹, Le Song¹, Walter F Stewart², Jimeng Sun¹

Affiliations

PMID: 33717639
PMCID: PMC7954122
DOI: 10.1145/3097983.3098126

GRAM: Graph-based Attention Model for Healthcare Representation Learning

Edward Choi et al. KDD. 2017 Aug.

. 2017 Aug:2017:787-795.

doi: 10.1145/3097983.3098126.

Authors

Edward Choi¹, Mohammad Taha Bahadori¹, Le Song¹, Walter F Stewart², Jimeng Sun¹

Affiliations

¹ Georgia Institute of Technology, Atlanta, GA, USA.
² Sutter Health, Walnut Creek, CA, USA.

PMID: 33717639
PMCID: PMC7954122
DOI: 10.1145/3097983.3098126

Abstract

Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: Data insufficiency: Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results.Interpretation: The representations learned by deep learning methods should align with medical knowledge. To address these challenges, we propose GRaph-based Attention Model (GRAM) that supplements electronic health records (EHR) with hierarchical information inherent to medical ontologies. Based on the data volume and the ontology structure, GRAM represents a medical concept as a combination of its ancestors in the ontology via an attention mechanism. We compared predictive performance (i.e. accuracy, data needs, interpretability) of GRAM to various methods including the recurrent neural network (RNN) in two sequential diagnoses prediction tasks and one heart failure prediction task. Compared to the basic RNN, GRAM achieved 10% higher accuracy for predicting diseases rarely observed in the training data and 3% improved area under the ROC curve for predicting heart failure using an order of magnitude less training data. Additionally, unlike other methods, the medical concept representations learned by GRAM are well aligned with the medical ontology. Finally, GRAM exhibits intuitive attention behaviors by adaptively generalizing to higher level concepts when facing data insufficiency at the lower level concepts.

Keywords: Attention Model; Electronic Health Records; Graph; Predictive Healthcare.

PubMed Disclaimer

Figures

**Figure 1:**
The illustration of GRAM. Leaf nodes (solid circles) represents a medical concept in the EHR, while the non-leaf nodes (dotted circles) represent more general concepts. The final representation g_i of the leaf concept *c_i* is computed by combining the basic embeddings e_i of *c_i* and e_g, e_c and e_a of its ancestors *c_g*, *c_c* and *c_a* via an attention mechanism. The final representations form the embedding matrix G for all leaf concepts. After that, we use G to embed patient visit vector x_t to a visit representation v_t, which is then fed to a neural network model to make the final prediction ${\hat{y}}_{t}$ .

**Figure 2:**
Creating the co-occurrence matrix together with the ancestors. The n-th ancestors are the group of nodes that are n hops away from any leaf node in $G$ . Here we exclude the root node, which will be just a single row (column).

**Figure 3:**
t-SNE scatterplots of medical concepts trained by GRAM+, GRAM, RNN+, RNN, RandomDAG, GloVe and Skip-gram. The color of the dots represents the highest disease categories and the text annotations represent the detailed disease categories in CCS multi-level hierarchy. It is clear that GRAM+ and GRAM exhibit interpretable embedding that are well aligned with the medical ontology.

**Figure 4:**
GRAM’s attention behavior during HF prediction for four representative diseases (each column). In each figure, the leaf node represents the disease and upper nodes are its ancestors. The size of the node shows the amount of attention it receives, which is also shown by the bar charts. The number in the parenthesis next to the disease is its frequency in the training data. We exclude the root of the knowledge DAG $G$ from all figures as it did not play a significant role.

See this image and copyright information in PMC

References

1. Ba Jimmy, Mnih Volodymyr, and Kavukcuoglu Koray. 2014. Multiple object recognition with visual attention. arXiv:1412.7755 (2014).
1. Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 (2014).
1. Bengio Yoshua, Simard Patrice, and Frasconi Paolo. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994). - PubMed
1. Bollacker Kurt, Evans Colin, Paritosh Praveen, Sturge Tim, and Taylor Jamie. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD.
1. Bordes Antoine, Usunier Nicolas, Garcia-Duran Alberto, Weston Jason and Yakhnenko Oksana. 2013. Translating embeddings for modeling multi-relational data. In NIPS.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GRAM: Graph-based Attention Model for Healthcare Representation Learning

Affiliations

GRAM: Graph-based Attention Model for Healthcare Representation Learning

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources