Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 6;6(1):108.
doi: 10.1038/s41746-023-00840-9.

A foundational vision transformer improves diagnostic performance for electrocardiograms

Affiliations

A foundational vision transformer improves diagnostic performance for electrocardiograms

Akhil Vaid et al. NPJ Digit Med. .

Abstract

The electrocardiogram (ECG) is a ubiquitous diagnostic modality. Convolutional neural networks (CNNs) applied towards ECG analysis require large sample sizes, and transfer learning approaches for biomedical problems may result in suboptimal performance when pre-training is done on natural images. We leveraged masked image modeling to create a vision-based transformer model, HeartBEiT, for electrocardiogram waveform analysis. We pre-trained this model on 8.5 million ECGs and then compared performance vs. standard CNN architectures for diagnosis of hypertrophic cardiomyopathy, low left ventricular ejection fraction and ST elevation myocardial infarction using differing training sample sizes and independent validation datasets. We find that HeartBEiT has significantly higher performance at lower sample sizes compared to other models. We also find that HeartBEiT improves explainability of diagnosis by highlighting biologically relevant regions of the EKG vs. standard CNNs. Domain specific pre-trained transformer models may exceed the classification performance of models trained on natural images especially in very low data regimes. The combination of the architecture and such pre-training allows for more accurate, granular explainability of model predictions.

PubMed Disclaimer

Conflict of interest statement

Dr. Nadkarni reports consultancy agreements with AstraZeneca, BioVie, GLG Consulting, Pensieve Health, Reata, Renalytix, Siemens Healthineers, and Variant Bio; research funding from Goldfinch Bio and Renalytix; honoraria from AstraZeneca, BioVie, Lexicon, Daiichi Sankyo, Meanrini Health and Reata; patents or royalties with Renalytix; owns equity and stock options in Pensieve Health and Renalytix as a scientific cofounder; owns equity in Verici Dx; has received financial compensation as a scientific board member and advisor to Renalytix; serves on the advisory board of Neurona Health; and serves in an advisory or leadership role for Pensieve Health and Renalytix. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.

Figures

Fig. 1
Fig. 1. Modeling workflow.
Pre-training of the HeartBEiT model. (1) Each original ECG is partitioned into 14 × 14 patches (2) of 16 × 16 pixels. These patches are tokenized, and some of them are masked (3). The Dall-E model (4) acts as the tokenizer and converts the image into discrete tokens (5) which are then made part of the Masked Image Modeling process (6). This allows for pre-training the HeartBEiT model’s attention modules (7), and the model may then be used for downstream fine-tuning and inference (8, 9) upon addition of a Multi-Layer Perceptron classification head (10).
Fig. 2
Fig. 2. Left ventricular ejection fraction < = 40% classification on ECGs.
a Internal testing performance (4 Mount Sinai facilities). b Internal testing performance difference. c External validation performance (Morningside patients). d External validation performance difference. Red dashed line in (b) and (d) indicates HeartBEiT performance.
Fig. 3
Fig. 3. Hypertrophic cardiomyopathy classification on ECGs.
a Internal testing performance (4 Mount Sinai facilities). b Internal testing performance difference. c External validation performance (Morningside patients). d External validation performance difference. Red dashed line in (b) and (d) indicates HeartBEiT performance.
Fig. 4
Fig. 4. STEMI detection on ECGs (PTB-XL database).
a Internal testing performance. b Internal testing performance difference. Dashed red line in (b) indicates HeartBEiT performance.
Fig. 5
Fig. 5. Saliency mapping for STEMI detection at 1% training data.
a ViT-B/16. b EfficientNet-B4. c ResNet-152. d HeartBEiT. HeartBEiT localizes to the ST segments. Other models are more diffuse in highlighting features of importance and may be less useful clinically.

References

    1. Drazen E, Mann N, Borun R, Laks M, Bersen A. Survey of computer-assisted electrocardiography in the United States. J. Electrocardiol. 1988;21:S98–S104. doi: 10.1016/0022-0736(88)90068-4. - DOI - PubMed
    1. Vaid A, et al. Automated Determination of Left Ventricular Function Using Electrocardiogram Data in Patients on Maintenance Hemodialysis. Clin. J. Am. Soc. Nephrol. 2022;17:1017–1025. doi: 10.2215/CJN.16481221. - DOI - PMC - PubMed
    1. Vaid A, et al. Using deep-learning algorithms to simultaneously identify right and left ventricular dysfunction from the electrocardiogram. Cardiovasc. Imaging. 2022;15:395–410. - PMC - PubMed
    1. Vaid A, et al. Multi-center retrospective cohort study applying deep learning to electrocardiograms to identify left heart valvular dysfunction. Commun. Med. 2023;3:24. doi: 10.1038/s43856-023-00240-w. - DOI - PMC - PubMed
    1. Mincholé A, Camps J, Lyon A, Rodríguez B. Machine learning in the electrocardiogram. J. Electrocardiol. 2019;57:S61–S64. doi: 10.1016/j.jelectrocard.2019.08.008. - DOI - PubMed