Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy^#¹, Yang Xiang^#², Ziqian Xie^#¹, Cui Tao¹, Degui Zhi³

Affiliations

¹ School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
² Peng Cheng Laboratory, Shenzhen, China. xiangy@pcl.ac.cn.
³ School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA. Degui.Zhi@uth.tmc.edu.

^# Contributed equally.

PMID: 34017034
PMCID: PMC8137882
DOI: 10.1038/s41746-021-00455-y

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy et al. NPJ Digit Med. 2021.

. 2021 May 20;4(1):86.

doi: 10.1038/s41746-021-00455-y.

Authors

Laila Rasmy^#¹, Yang Xiang^#², Ziqian Xie^#¹, Cui Tao¹, Degui Zhi³

Affiliations

¹ School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
² Peng Cheng Laboratory, Shenzhen, China. xiangy@pcl.ac.cn.
³ School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA. Degui.Zhi@uth.tmc.edu.

^# Contributed equally.

PMID: 34017034
PMCID: PMC8137882
DOI: 10.1038/s41746-021-00455-y

Abstract

Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21-6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Selection pipeline for the pretraining cohort from Cerner HealthFacts.**
The flow starts from left to right. Number of patients is between square brackets.

**Fig. 2. An example of structured EHR data of a hypothetical patient as it would be available from a typical EHR system (e.g., Cerner or Truven).**
For this patient, four visits with dates and encounter types are organized according to chronological order at the bottom. Detailed information including demographic and medical codes with time stamps are shown above. Note that not all information is recorded, as in real-world EHR recording system.

**Fig. 3. Med-BERT structure.**
The left part is a scheme of the overall Med-BERT architecture, the middle part details the individual components of the Med-BERT embedding layer, and the right part is an example of EHR data used in each embedding layer.

**Fig. 4. Comparison of prediction AUC for the test sets by training on different sizes of data on various cohorts between the methods with or without the pretrained Med-BERT layer.**
Logistic regression (LR) results are included as a baseline. a Cohort: DHF-Cerner, method: GRU; b cohort: DHF-Cerner, method: bidirectional GRU; c cohort: DHF-Cerner, method: RETAIN; d cohort: PaCa-Cerner, method: GRU; e cohort: PaCa-Cerner, method: bidirectional GRU; f cohort: PaCa-Cerner, method: RETAIN; g cohort: PaCa-Truven, method: GRU; h cohort: PaCa-Truven, method: bidirectional GRU; i cohort: PaCa-Truven, method: RETAIN. The shadows indicate the standard deviations.

**Fig. 5. Example of different connections of the same code, “type 2 diabetes mellitus”, in different visits.**
a The first visit, b the second visit. Connection lines from the code in the left panel to the code in the right panel indicate attentions of the Med-BERT model. Color of the line indicates individual attention head, and intensity of the line indicates attention weights.

**Fig. 6. Example of the dependency connections in the DHF-Cerner cohort.**
Connection lines from the code in the left panel to the code in the right panel indicate attentions of the Med-BERT model. Color of the line indicates individual attention head, and intensity of the line indicates attention weights.

**Fig. 7. Example of the dependency connections in the PaCa-Cerner cohort.**
Connection lines from the code in the left panel to the code in the right panel indicate attentions of the Med-BERT model. Color of the line indicates individual attention head, and intensity of the line indicates attention weights.

See this image and copyright information in PMC

References

1. Jiang F, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. 2017;2:230–243. doi: 10.1136/svn-2017-000101. - DOI - PMC - PubMed
1. Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018;2:719–731. doi: 10.1038/s41551-018-0305-z. - DOI - PubMed
1. Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access. 2017;5:8869–8879. doi: 10.1109/ACCESS.2017.2694446. - DOI
1. Wang H, et al. Predicting hospital readmission via cost-sensitive deep learning. IEEE/ACM Trans. Comput. Biol. Bioinforma. 2018;15:1968–1978. doi: 10.1109/TCBB.2018.2827029. - DOI - PubMed
1. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc. J. 2019;6:94. doi: 10.7861/futurehosp.6-2-94. - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Affiliations

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources