AI-ECG classification for Brugada syndrome: A study of machine learning techniques to optimise for limited datasets

Keenan Saleh¹, Raaif Hadadi², Yixiu Liang^{1

3}, Hong Wong¹, Arunashis Sau¹, James Howard¹, Evan Brittain⁴, Jeffrey Annis⁴, Majd El-Harasis⁴, Matthew Shun-Shin¹, Jagdeep Mohal¹, Akriti Naraen¹, Jack Samways¹, Jessica Artico⁵, James Ware¹, Prapa Kanagaratnam¹, Fu Siong Ng¹, Massoud Zolgharni^{1

6}, Wenjia Bai^{2

7}, Amanda Varnava⁵, Zachary Whinnett¹, Ahran Arnold¹

Affiliations

¹ National Heart and Lung Institute, Imperial College London, London, United Kingdom of Great Britain & Northern Ireland.
² Department of Computing, Imperial College London, London, United Kingdom of Great Britain & Northern Ireland.
³ Zhongshan Hospital, Shanghai, China.
⁴ Vanderbilt University Medical Centre, Nashville, United States of America.
⁵ Imperial College Healthcare NHS Trust, London, United Kingdom of Great Britain & Northern Ireland.
⁶ Intelligent Sensing and Vision, University of West London, London, United Kingdom of Great Britain & Northern Ireland.
⁷ Department of Brain Sciences, Imperial College London, London, United Kingdom of Great Britain & Northern Ireland.

PMID: 41739842
PMCID: PMC12935214
DOI: 10.1371/journal.pdig.0001222

AI-ECG classification for Brugada syndrome: A study of machine learning techniques to optimise for limited datasets

Keenan Saleh et al. PLOS Digit Health. 2026.

. 2026 Feb 25;5(2):e0001222.

doi: 10.1371/journal.pdig.0001222. eCollection 2026 Feb.

Authors

Affiliations

¹ National Heart and Lung Institute, Imperial College London, London, United Kingdom of Great Britain & Northern Ireland.
² Department of Computing, Imperial College London, London, United Kingdom of Great Britain & Northern Ireland.
³ Zhongshan Hospital, Shanghai, China.
⁴ Vanderbilt University Medical Centre, Nashville, United States of America.
⁵ Imperial College Healthcare NHS Trust, London, United Kingdom of Great Britain & Northern Ireland.
⁶ Intelligent Sensing and Vision, University of West London, London, United Kingdom of Great Britain & Northern Ireland.
⁷ Department of Brain Sciences, Imperial College London, London, United Kingdom of Great Britain & Northern Ireland.

PMID: 41739842
PMCID: PMC12935214
DOI: 10.1371/journal.pdig.0001222

Abstract

Deep neural networks can classify ECGs with high accuracy when training data is abundant. Rare conditions like Brugada syndrome, an inherited arrhythmia syndrome predisposing to sudden death, pose challenges due to data scarcity hindering model training. We evaluated multiple machine learning (ML) approaches to optimise a Brugada ECG classification model using limited training data. The baseline model was trained on a dataset comprising 176 Brugada, 176 right bundle branch block (RBBB) and 352 normal ECGs from Zhongshan Hospital (Zhongshan-baseline dataset), framed as a binary classification task to distinguish Brugada from non-Brugada ECGs. A 25%-75% train-test split was used to exacerbate data scarcity. To enhance training, we incorporated three additional datasets: (i) a different, labelled ECG dataset from Zhongshan Hospital including normal and RBBB ECGs (Zhongshan-pretrain), (ii) an unlabelled ECG dataset from Hammersmith Hospital including Brugada and non-Brugada ECGs (Imperial), (iii) an open-access labelled ECG dataset (PTB-XL). Three strategies were tested: (1) supervised pretraining, (2) self-supervised pretraining with data augmentation, and (3) oversampling using SMOTE (synthetic minority oversampling technique). Each model was evaluated on the unseen internal test set and an external Brugada mimic dataset. The models were re-trained using an 80%-20% train-test split as a secondary analysis. The baseline model achieved 92.2% accuracy, F1-score 0.837, and area under the Receiver Operating Characteristic curve (AUC) 0.962. Supervised pretraining significantly improved performance when training data was scarce, with the best model pretrained on the Zhongshan-pretrain dataset boosting accuracy (+3.2%), F1-score (+0.071) and AUC + 0.019), with consistent cross-validation performance. Self-supervised pretraining produced smaller and more variable gains, although select models better mitigated against false positives on the Brugada mimic dataset. SMOTE oversampling showed inconsistent effects on performance. Incorporating pretraining and oversampling may facilitate the development of more accurate AI-ECG models for rare diseases when training data is limited but provides diminishing returns when adequate labelled data is available.

Copyright: © 2026 Saleh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: JH is a shareholder in MyCardium AI Limited. EB receives funding to do AI-ECG work with Anumana Inc. All other authors have declared no competing interests exist.

Figures

**Fig 1. Network architecture.**
Network architecture featuring a 1D convolutional neural network with DenseNet-style blocks.

**Fig 2. Transfer learning pipeline.**
Overview of the supervised pretraining and fine-tuning workflow. A convolutional neural network is first pretrained on a large ECG dataset to learn general-purpose feature representations (Step 1). The pretrained feature extractor weights are then transferred to a new task-specific model, while the classification head is reinitialised (Step 2). During fine-tuning, a predefined proportion of the transferred feature extractor layers is frozen to preserve learned representations, while the remaining layers and the classification head are optimised using the Brugada syndrome training dataset (Steps 3–4). This strategy enables knowledge transfer from large-scale ECG data while limiting overfitting in data-scarce settings.

**Fig 3. Pair contrastive learning pipeline.**
Self-supervised contrastive pretraining framework (SimCLR) applied to ECG data. An original ECG is transformed into two augmented views using stochastic signal perturbations. Both augmented signals are passed through a shared feature extractor $f (\cdot$ to produce latent representations, which are further mapped by a projection head $g (\cdot)$ into a contrastive embedding space. The training objective minimises the distance between embeddings derived from different augmentations of the same ECG (positive pairs), while maximising the distance between embeddings from different ECGs within the batch (negative pairs). After pretraining, the projection head is discarded and the feature extractor is used for downstream Brugada classification.

**Fig 4. ECG augmentation techniques.**
Examples of ECG augmentations used during self-supervised contrastive pretraining. Representative 12-lead ECG beat shown in its original form (left), after addition of Gaussian noise (centre), and after addition of baseline drift (right). Each column displays the same cardiac beat across all 12 leads. Augmentations were designed to introduce realistic signal variability while preserving diagnostically relevant morphology. These transformed views were used to generate positive pairs during contrastive pretraining, encouraging the model to learn invariant ECG representations robust to noise and baseline shifts.

**Fig 5. SMOTE-generated Brugada ECGs.**
Three example SMOTE-generated Brugada ECGs, with associated J-point amplitude and β angle in lead V1.

**Fig 6. SHAP-based lead-level analysis for Brugada syndrome classification.**
(A) Mean absolute SHAP value per ECG lead across all test samples with Brugada syndrome, reflecting average attribution magnitude. Error bars represent one standard deviation. (B) Frequency with which each lead appeared in the top three most impactful, as determined by SHAP attribution, across Brugada cases.

See this image and copyright information in PMC

References

1. Attia ZI, Harmon DM, Behr ER, Friedman PA. Application of artificial intelligence to the electrocardiogram. Eur Heart J. 2021;42 (46):4717–30. doi: 10.1093/eurheartj/ehab649 - DOI - PMC - PubMed
1. Li C, Denison T, Zhu T. A Survey of Few-Shot Learning for Biomedical Time Series. IEEE Rev Biomed Eng. 2025;18:192–210. doi: 10.1109/RBME.2024.3492381 - DOI - PubMed
1. Pezoulas VC, Zaridis DI, Mylona E, Androutsos C, Apostolidis K, Tachos NS, et al. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Comput Struct Biotechnol J. 2024;23:2892–910. doi: 10.1016/j.csbj.2024.07.005 - DOI - PMC - PubMed
1. Postema PG. About Brugada syndrome and its prevalence. Europace. 2012;14 (7):925–8. doi: 10.1093/europace/eus042 - DOI - PubMed
1. Asatryan B, Barth AS. Sex-related differences in incidence, phenotype and risk of sudden cardiac death in inherited arrhythmia syndromes. Front Cardiovasc Med. 2023;9:1010748. doi: 10.3389/fcvm.2022.1010748 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AI-ECG classification for Brugada syndrome: A study of machine learning techniques to optimise for limited datasets

Affiliations

AI-ECG classification for Brugada syndrome: A study of machine learning techniques to optimise for limited datasets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References