Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Feb 25;5(2):e0001222.
doi: 10.1371/journal.pdig.0001222. eCollection 2026 Feb.

AI-ECG classification for Brugada syndrome: A study of machine learning techniques to optimise for limited datasets

Affiliations

AI-ECG classification for Brugada syndrome: A study of machine learning techniques to optimise for limited datasets

Keenan Saleh et al. PLOS Digit Health. .

Abstract

Deep neural networks can classify ECGs with high accuracy when training data is abundant. Rare conditions like Brugada syndrome, an inherited arrhythmia syndrome predisposing to sudden death, pose challenges due to data scarcity hindering model training. We evaluated multiple machine learning (ML) approaches to optimise a Brugada ECG classification model using limited training data. The baseline model was trained on a dataset comprising 176 Brugada, 176 right bundle branch block (RBBB) and 352 normal ECGs from Zhongshan Hospital (Zhongshan-baseline dataset), framed as a binary classification task to distinguish Brugada from non-Brugada ECGs. A 25%-75% train-test split was used to exacerbate data scarcity. To enhance training, we incorporated three additional datasets: (i) a different, labelled ECG dataset from Zhongshan Hospital including normal and RBBB ECGs (Zhongshan-pretrain), (ii) an unlabelled ECG dataset from Hammersmith Hospital including Brugada and non-Brugada ECGs (Imperial), (iii) an open-access labelled ECG dataset (PTB-XL). Three strategies were tested: (1) supervised pretraining, (2) self-supervised pretraining with data augmentation, and (3) oversampling using SMOTE (synthetic minority oversampling technique). Each model was evaluated on the unseen internal test set and an external Brugada mimic dataset. The models were re-trained using an 80%-20% train-test split as a secondary analysis. The baseline model achieved 92.2% accuracy, F1-score 0.837, and area under the Receiver Operating Characteristic curve (AUC) 0.962. Supervised pretraining significantly improved performance when training data was scarce, with the best model pretrained on the Zhongshan-pretrain dataset boosting accuracy (+3.2%), F1-score (+0.071) and AUC + 0.019), with consistent cross-validation performance. Self-supervised pretraining produced smaller and more variable gains, although select models better mitigated against false positives on the Brugada mimic dataset. SMOTE oversampling showed inconsistent effects on performance. Incorporating pretraining and oversampling may facilitate the development of more accurate AI-ECG models for rare diseases when training data is limited but provides diminishing returns when adequate labelled data is available.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: JH is a shareholder in MyCardium AI Limited. EB receives funding to do AI-ECG work with Anumana Inc. All other authors have declared no competing interests exist.

Figures

Fig 1
Fig 1. Network architecture.
Network architecture featuring a 1D convolutional neural network with DenseNet-style blocks.
Fig 2
Fig 2. Transfer learning pipeline.
Overview of the supervised pretraining and fine-tuning workflow. A convolutional neural network is first pretrained on a large ECG dataset to learn general-purpose feature representations (Step 1). The pretrained feature extractor weights are then transferred to a new task-specific model, while the classification head is reinitialised (Step 2). During fine-tuning, a predefined proportion of the transferred feature extractor layers is frozen to preserve learned representations, while the remaining layers and the classification head are optimised using the Brugada syndrome training dataset (Steps 3–4). This strategy enables knowledge transfer from large-scale ECG data while limiting overfitting in data-scarce settings.
Fig 3
Fig 3. Pair contrastive learning pipeline.
Self-supervised contrastive pretraining framework (SimCLR) applied to ECG data. An original ECG is transformed into two augmented views using stochastic signal perturbations. Both augmented signals are passed through a shared feature extractor f(·to produce latent representations, which are further mapped by a projection head g(·) into a contrastive embedding space. The training objective minimises the distance between embeddings derived from different augmentations of the same ECG (positive pairs), while maximising the distance between embeddings from different ECGs within the batch (negative pairs). After pretraining, the projection head is discarded and the feature extractor is used for downstream Brugada classification.
Fig 4
Fig 4. ECG augmentation techniques.
Examples of ECG augmentations used during self-supervised contrastive pretraining. Representative 12-lead ECG beat shown in its original form (left), after addition of Gaussian noise (centre), and after addition of baseline drift (right). Each column displays the same cardiac beat across all 12 leads. Augmentations were designed to introduce realistic signal variability while preserving diagnostically relevant morphology. These transformed views were used to generate positive pairs during contrastive pretraining, encouraging the model to learn invariant ECG representations robust to noise and baseline shifts.
Fig 5
Fig 5. SMOTE-generated Brugada ECGs.
Three example SMOTE-generated Brugada ECGs, with associated J-point amplitude and β angle in lead V1.
Fig 6
Fig 6. SHAP-based lead-level analysis for Brugada syndrome classification.
(A) Mean absolute SHAP value per ECG lead across all test samples with Brugada syndrome, reflecting average attribution magnitude. Error bars represent one standard deviation. (B) Frequency with which each lead appeared in the top three most impactful, as determined by SHAP attribution, across Brugada cases.

References

    1. Attia ZI, Harmon DM, Behr ER, Friedman PA. Application of artificial intelligence to the electrocardiogram. Eur Heart J. 2021;42 (46):4717–30. doi: 10.1093/eurheartj/ehab649 - DOI - PMC - PubMed
    1. Li C, Denison T, Zhu T. A Survey of Few-Shot Learning for Biomedical Time Series. IEEE Rev Biomed Eng. 2025;18:192–210. doi: 10.1109/RBME.2024.3492381 - DOI - PubMed
    1. Pezoulas VC, Zaridis DI, Mylona E, Androutsos C, Apostolidis K, Tachos NS, et al. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Comput Struct Biotechnol J. 2024;23:2892–910. doi: 10.1016/j.csbj.2024.07.005 - DOI - PMC - PubMed
    1. Postema PG. About Brugada syndrome and its prevalence. Europace. 2012;14 (7):925–8. doi: 10.1093/europace/eus042 - DOI - PubMed
    1. Asatryan B, Barth AS. Sex-related differences in incidence, phenotype and risk of sudden cardiac death in inherited arrhythmia syndromes. Front Cardiovasc Med. 2023;9:1010748. doi: 10.3389/fcvm.2022.1010748 - DOI - PMC - PubMed