Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 20;4(1):86.
doi: 10.1038/s41746-021-00455-y.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Affiliations

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy et al. NPJ Digit Med. .

Abstract

Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21-6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Selection pipeline for the pretraining cohort from Cerner HealthFacts.
The flow starts from left to right. Number of patients is between square brackets.
Fig. 2
Fig. 2. An example of structured EHR data of a hypothetical patient as it would be available from a typical EHR system (e.g., Cerner or Truven).
For this patient, four visits with dates and encounter types are organized according to chronological order at the bottom. Detailed information including demographic and medical codes with time stamps are shown above. Note that not all information is recorded, as in real-world EHR recording system.
Fig. 3
Fig. 3. Med-BERT structure.
The left part is a scheme of the overall Med-BERT architecture, the middle part details the individual components of the Med-BERT embedding layer, and the right part is an example of EHR data used in each embedding layer.
Fig. 4
Fig. 4. Comparison of prediction AUC for the test sets by training on different sizes of data on various cohorts between the methods with or without the pretrained Med-BERT layer.
Logistic regression (LR) results are included as a baseline. a Cohort: DHF-Cerner, method: GRU; b cohort: DHF-Cerner, method: bidirectional GRU; c cohort: DHF-Cerner, method: RETAIN; d cohort: PaCa-Cerner, method: GRU; e cohort: PaCa-Cerner, method: bidirectional GRU; f cohort: PaCa-Cerner, method: RETAIN; g cohort: PaCa-Truven, method: GRU; h cohort: PaCa-Truven, method: bidirectional GRU; i cohort: PaCa-Truven, method: RETAIN. The shadows indicate the standard deviations.
Fig. 5
Fig. 5. Example of different connections of the same code, “type 2 diabetes mellitus”, in different visits.
a The first visit, b the second visit. Connection lines from the code in the left panel to the code in the right panel indicate attentions of the Med-BERT model. Color of the line indicates individual attention head, and intensity of the line indicates attention weights.
Fig. 6
Fig. 6. Example of the dependency connections in the DHF-Cerner cohort.
Connection lines from the code in the left panel to the code in the right panel indicate attentions of the Med-BERT model. Color of the line indicates individual attention head, and intensity of the line indicates attention weights.
Fig. 7
Fig. 7. Example of the dependency connections in the PaCa-Cerner cohort.
Connection lines from the code in the left panel to the code in the right panel indicate attentions of the Med-BERT model. Color of the line indicates individual attention head, and intensity of the line indicates attention weights.

References

    1. Jiang F, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. 2017;2:230–243. doi: 10.1136/svn-2017-000101. - DOI - PMC - PubMed
    1. Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018;2:719–731. doi: 10.1038/s41551-018-0305-z. - DOI - PubMed
    1. Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access. 2017;5:8869–8879. doi: 10.1109/ACCESS.2017.2694446. - DOI
    1. Wang H, et al. Predicting hospital readmission via cost-sensitive deep learning. IEEE/ACM Trans. Comput. Biol. Bioinforma. 2018;15:1968–1978. doi: 10.1109/TCBB.2018.2827029. - DOI - PubMed
    1. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc. J. 2019;6:94. doi: 10.7861/futurehosp.6-2-94. - DOI - PMC - PubMed