Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 28;10(1):7155.
doi: 10.1038/s41598-020-62922-y.

BEHRT: Transformer for Electronic Health Records

Affiliations

BEHRT: Transformer for Electronic Health Records

Yikuan Li et al. Sci Rep. .

Abstract

Today, despite decades of developments in medicine and the growing interest in precision healthcare, vast majority of diagnoses happen once patients begin to show noticeable signs of illness. Early indication and detection of diseases, however, can provide patients and carers with the chance of early intervention, better disease management, and efficient allocation of healthcare resources. The latest developments in machine learning (including deep learning) provides a great opportunity to address this unmet need. In this study, we introduce BEHRT: A deep neural sequence transduction model for electronic health records (EHR), capable of simultaneously predicting the likelihood of 301 conditions in one's future visits. When trained and evaluated on the data from nearly 1.6 million individuals, BEHRT shows a striking improvement of 8.0-13.2% (in terms of average precision scores for different tasks), over the existing state-of-the-art deep EHR models. In addition to its scalability and superior accuracy, BEHRT enables personalised interpretation of its predictions; its flexible architecture enables it to incorporate multiple heterogeneous concepts (e.g., diagnosis, medication, measurements, and more) to further improve the accuracy of its predictions; its (pre-)training results in disease and patient representations can be useful for future studies (i.e., transfer learning).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Linkage and filtering of CPRD data. This flow lists all the key steps of our data cleaning and linkage procedure. At each step, the number of patients that are included is shown. As you can see, we started with nearly 8 million patients and our final data (used for training and evaluation of our models) consists of 1.6 million patients, each meeting our inclusion criteria.
Figure 2
Figure 2
Preparation of CPRD data for BEHRT. An example patient’s EHR sequence can be seen in the figure, which consists of 8 visits. In each visit, the record can consist of concepts such as diagnoses, medications and measurements; all these values are artificial and for illustration purposes. For this study, we are only interested in age and diagnoses. Therefore, as shown in at the bottom of the figure, we have taken only the diagnosis and age subset of the record to form the necessary sequences. This resulting sequence is how we represent every patient’s EHR in our modelling process. Note that the visits shown in purple boxes are not going to be represented to the model, due to them lacking diagnoses.
Figure 3
Figure 3
BEHRT architecture. Using the artificial data shown in Fig. 2, section (a) shows how BEHRT sees one’s EHR. In addition to diagnosis and age, BEHRT also employs an encoding for an event’s positions (shown as POSITION) and an encoding for visit (shown as SEGMENT with A and B), which alternates between visits. The summation of all these embeddings results in a final embedding shown at the bottom of (a), which will be the latent contextual representation of one’s EHR at a given visit’s diagnosis. Section (b) on the other hand shows BEHRT’s Transformer-based architecture. It is first pre-trained by the MLM task, to learn the network parameters (including the disease embeddings) that can predict the masked disease tokens. When training the model in downstream tasks (i.e., T1 to T3 - detailed explanation can be found in section: Disease Prediction), the model finetunes the weights pre-trained in the MLM task and learns the weights for the classification layer (i.e., mapping T1 to the pooling layer and finally to the subsequent diseases classifier).
Figure 4
Figure 4
Visual investigation of the disease embedding. In this image we see a graph of disease embeddings projected in two dimensions where distance represents closeness of contextual association. The colors represent the Caliber chapter. Most associations are accepted by medical experts and maintain the gender-based divisions in illnesses, among other things. We zoom in and profile four clusters in this plot – shown in subfigures (AD).
Figure 5
Figure 5
The analysis of BEHRT’s self-attention. This figure shows the EHR history of patients (A and B), each presented as two identical columns (shown chronologically, going downwards) for the convenience of association analysis. The left side of the column represents the disease of interest and the right column indicates the corresponding associations to the highlighted disease on the left. The intensity of the blue on the right column represents the strength of the attention score – the stronger the intensity, the stronger the association and hence the stronger the attention score. The attention scores are specifically retrieved from the attention component of the last layer of BEHRT network.
Figure 6
Figure 6
Disease-wise precision analysis. Each circle in these graphs represents a disease, and its colour and size denote the Caliber chapter and prevalence, respectively. Also, in these plots, we show APS and AUROC on the x- and y-axis, respectively. Therefore, the further right and higher a disease, the better BEHRT’s job at predicting its occurrence in the next 6 months. Subplot (A) illustrated the full results, and subplots (B and C) illustrate the best and worst sections of the plot, in terms of BEHRT’s performance.

References

    1. Ardila D, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. medicine. 2019;25:954. doi: 10.1038/s41591-019-0447-x. - DOI - PubMed
    1. Poplin R, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2018;2:158. doi: 10.1038/s41551-018-0195-0. - DOI - PubMed
    1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat. medicine. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed
    1. Esteva A, et al. A guide to deep learning in healthcare. Nat. medicine. 2019;25:24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed
    1. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE journal biomedical health informatics. 2017;22:1589–1604. doi: 10.1109/JBHI.2017.2767063. - DOI - PMC - PubMed

Publication types