Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2024 Aug 14;19(8):e0307978.
doi: 10.1371/journal.pone.0307978. eCollection 2024.

Applying masked autoencoder-based self-supervised learning for high-capability vision transformers of electrocardiographies

Affiliations
Multicenter Study

Applying masked autoencoder-based self-supervised learning for high-capability vision transformers of electrocardiographies

Shinnosuke Sawano et al. PLoS One. .

Abstract

The generalization of deep neural network algorithms to a broader population is an important challenge in the medical field. We aimed to apply self-supervised learning using masked autoencoders (MAEs) to improve the performance of the 12-lead electrocardiography (ECG) analysis model using limited ECG data. We pretrained Vision Transformer (ViT) models by reconstructing the masked ECG data with MAE. We fine-tuned this MAE-based ECG pretrained model on ECG-echocardiography data from The University of Tokyo Hospital (UTokyo) for the detection of left ventricular systolic dysfunction (LVSD), and then evaluated it using multi-center external validation data from seven institutions, employing the area under the receiver operating characteristic curve (AUROC) for assessment. We included 38,245 ECG-echocardiography pairs from UTokyo and 229,439 pairs from all institutions. The performances of MAE-based ECG models pretrained using ECG data from UTokyo were significantly higher than that of other Deep Neural Network models across all external validation cohorts (AUROC, 0.913-0.962 for LVSD, p < 0.001). Moreover, we also found improvements for the MAE-based ECG analysis model depending on the model capacity and the amount of training data. Additionally, the MAE-based ECG analysis model maintained high performance even on the ECG benchmark dataset (PTB-XL). Our proposed method developed high performance MAE-based ECG analysis models using limited ECG data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Flow chart for splitting the datasets.
Flow chart showing how the datasets used for model training and validation were created. LVSD, left ventricular systolic dysfunction; MAE38K, Vision Transformers pretrained on ECG data from UTokyo using a masked autoencoder; Large-dataset, electrocardiography and echocardiography paired dataset from three institutions (UTokyo, Mitsui, and Asahi); MAE130K, Vision Transformers pretrained on ECG data from three institutions (UTokyo, Mitsui, and Asahi).
Fig 2
Fig 2. Network architecture of MAE for 12-lead ECGs.
This figure shows the network architecture of MAE-based self-supervised learning for 12-lead ECGs. The ViT-Huge is used as an example in the figure. We treated original ECG data from each lead as a 1×5000 matrix of the ECG voltage. The input ECG data was divided into 1 × 250 patches and voltage data from each lead is converted into 20 patch sequences, which is 240 patch sequences for 12-lead ECGs. These patches were randomly masked and only unmasked patches (60 patch sequences) were input to the MAE encoder. We used ViT-Huge encoder for the MAE encoder. ViT-Huge encoder then output 60 encoded patch sequence with 1280-dimensional feature vectors. For the input to the MAE decoder, the full set of patches consisting of encoded patches and masked patches were applied. Proposing MAE reconstructs the input by predicting the voltage values for each masked patch of 12-lead ECGs. Each element in the decoder’s output is a vector of voltage values representing a patch. The last layer of the decoder is a linear projection whose number of output channels equals the number of inputs. Loss function computes the mean squared error as reconstructive loss between the reconstructed and original 12-lead ECGs. Same as original MAE, we compute the loss only on masked patches. These processes could create ViT-Huge encoders for 12-lead ECGs with high performances for downstream task. Other implementation details followed those in a previous study [15]. In this study, while we use ViT-Large and ViT-Base as well, the primary model employed is ViT-Huge. Given that the structure of MAE does not change with the size of the ViT model, Fig 2 is presented using ViT-Huge as an example. MAE, Masked Autoencoder; ECG, electrocardiography; ViT, Vision transformer.
Fig 3
Fig 3. Example of the reconstruction process in II and V5 lead.
(A) Original ECGs; (B) masked ECGs; and (C) reconstructed ECGs.
Fig 4
Fig 4. Model performance values used to detect LVSD from 12-lead ECGs on the internal test dataset and external validation cohorts.
The bars indicate the AUROC for LVSD detection of models on the internal test dataset and validation cohorts of Mitsui, Asahi, Sakakibara, Jichi, TokyoBay, JR, and NTT. LVSD, left ventricular systolic dysfunction; AUROC, area under the receiver operating characteristics curve; ViT-Huge38K, Vision Transformer Huge pretrained on ECG data from UTokyo using a masked autoencoder; ViT-Large38K, Vision Transformer Large pretrained on ECG data from UTokyo using a masked autoencoder; ViT-Base38K, Vision Transformer Base pretrained on ECG data from UTokyo using a masked autoencoder; Baseline-CNN, two-dimensional convolutional neural network; ViT-IN1K, Vision Transformer pretrained on ImageNet-1K using a masked autoencoder.

References

    1. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 2018;15(11):e1002683. doi: 10.1371/journal.pmed.1002683 - DOI - PMC - PubMed
    1. Goto S, Solanki D, John JE, Yagi R, Homilius M, Ichihara G, et al.. Multinational Federated Learning Approach to Train ECG and Echocardiogram Models for Hypertrophic Cardiomyopathy Detection. Circulation. 2022;146(10):755–69. doi: 10.1161/CIRCULATIONAHA.121.058696 - DOI - PMC - PubMed
    1. Kagiyama N, Piccirilli M, Yanamala N, Shrestha S, Farjo PD, Casaclang-Verzosa G, et al.. Machine Learning Assessment of Left Ventricular Diastolic Function Based on Electrocardiographic Features. J Am Coll Cardiol. 2020;76(8):930–41. doi: 10.1016/j.jacc.2020.06.061 - DOI - PubMed
    1. Tuncer T, Dogan S, Pławiak P, Acharya UR. Automated arrhythmia detection using novel hexadecimal local pattern and multilevel wavelet transform with ECG signals. Knowledge-Based Systems. 2019;186:104923.
    1. Subasi A, Dogan S, Tuncer T. A novel automated tower graph based ECG signal classification method with hexadecimal local adaptive binary pattern and deep learning. Journal of Ambient Intelligence and Humanized Computing. 2023;14(2):711–25.

Publication types

LinkOut - more resources