. 2025 Sep 26;15(1):32959.

doi: 10.1038/s41598-025-16219-7.

Temporal convolutional transformer for EEG based motor imagery decoding

Hamdi Altaheri¹, Fakhri Karray^{2

3}, Amir-Hossein Karimi^{2

4}

Affiliations

¹ Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada. haltaheri@uwaterloo.ca.
² Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada.
³ Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
⁴ Vector Institute for Artificial Intelligence, Toronto, ON, Canada.

PMID: 41006379
PMCID: PMC12475102
DOI: 10.1038/s41598-025-16219-7

Temporal convolutional transformer for EEG based motor imagery decoding

Hamdi Altaheri et al. Sci Rep. 2025.

. 2025 Sep 26;15(1):32959.

doi: 10.1038/s41598-025-16219-7.

Authors

Hamdi Altaheri¹, Fakhri Karray^{2

3}, Amir-Hossein Karimi^{2

4}

Affiliations

¹ Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada. haltaheri@uwaterloo.ca.
² Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada.
³ Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
⁴ Vector Institute for Artificial Intelligence, Toronto, ON, Canada.

PMID: 41006379
PMCID: PMC12475102
DOI: 10.1038/s41598-025-16219-7

Abstract

Brain-computer interfaces (BCIs) based on motor imagery (MI) offer a transformative pathway for rehabilitation, communication, and control by translating imagined movements into actionable commands. However, accurately decoding motor imagery from electroencephalography (EEG) signals remains a significant challenge in BCI research. In this paper, we propose TCFormer, a temporal convolutional Transformer designed to improve the performance of EEG-based motor imagery decoding. TCFormer integrates a multi-kernel convolutional neural network (MK-CNN) for spatial-temporal feature extraction with a Transformer encoder enhanced by grouped query attention to capture global contextual dependencies. A temporal convolutional network (TCN) head follows, utilizing dilated causal convolutions to enable the model to learn long-range temporal patterns and generate final class predictions. The architecture is evaluated on three benchmark motor imagery and motor execution EEG datasets: BCIC IV-2a, BCIC IV-2b, and HGD, achieving average accuracies of 84.79, 87.71, and 96.27%, respectively, outperforming current methods. These results demonstrate the effectiveness of the integrated design in addressing the inherent complexity of EEG signals. The code is publicly available at https://github.com/altaheri/TCFormer .

Keywords: Brain signal decoding; Convolutional neural network; Electroencephalography (EEG); Grouped query attention; Motor imagery classification; Temporal convolutional network; Transformers.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
TCFormer architecture, comprising a convolutional module, a Transformer encoder, and a TCN, followed by a classifier. The convolutional module performs multi-kernel temporal filtering and spatial (depth-wise) filtering to extract multi-scale EEG features. The Transformer encoder, composed of stacked layers, applies grouped-query attention (GQA) and feed-forward (FF) sublayers to model global dependencies, with rotary positional embeddings (RoPE) providing temporal context. The TCN module utilizes causal dilated convolutions to capture local temporal patterns and sequential dependencies that are not captured by the Transformer and produce the final class logits. *Conv* convolution, BN batch norm, *ELU* exponential linear unit.

formula image — **Fig. 1**
TCFormer architecture, comprising a convolutional module, a Transformer encoder, and a TCN, followed by a classifier. The convolutional module performs multi-kernel temporal filtering and spatial (depth-wise) filtering to extract multi-scale EEG features. The Transformer encoder, composed of stacked layers, applies grouped-query attention (GQA) and feed-forward (FF) sublayers to model global dependencies, with rotary positional embeddings (RoPE) providing temporal context. The TCN module utilizes causal dilated convolutions to capture local temporal patterns and sequential dependencies that are not captured by the Transformer and produce the final class logits. *Conv* convolution, BN batch norm, *ELU* exponential linear unit.

**Fig. 2**
Feature-extraction pipeline: raw EEG is first processed by a multi-kernel convolutional block to capture diverse temporal scales, then passed through a Transformer encoder with grouped-query attention to model global dependencies, yielding the encoded feature tensor that feeds into the TCN classification head.

**Fig. 3**
Grouped squeeze-and-excitation (SE) attention in the multi-kernel convolution block. The input feature map is globally average-pooled along the temporal dimension to produce a channel descriptor of shape . This descriptor is passed through two grouped convolutions, with reduction ratio , followed by ReLU and sigmoid activations, resulting in per-group attention weights. These weights, each in the range are broadcast across their corresponding channels and used to rescale the original input. The reweighted features are then added to the residual path to produce the final output .

**Fig. 4**
Comparison of attention mechanisms. (Left) multi-head attention (MHA): each query head has its own key- and value-projection matrices. (Middle) multi-query attention (MQA): all heads share a single key-value pair, minimizing memory cost. (Right) grouped-query attention (GQA): query heads are divided into groups, with each group sharing a key-value pair—providing a trade-off between the expressiveness of MHA and the efficiency of MQA. GQA reduces to MQA when , and becomes equivalent to MHA when .

**Fig. 5**
Grouped-query attention (GQA).

**Fig. 6**
The temporal convolutional network.

**Fig. 7**
Temporal convolutional network (TCN) head: the feature sequence, generated by the MK-CNN and Transformer encoder (as shown in Fig. 2), is processed through two residual TCN blocks. Each block employs dilated causal convolutions with a kernel size of and dilation rates of 1 and , respectively, resulting in a total receptive field of 19 time steps. The output at the final time step—a vector of size —is passed to a convolutional classification layer to produce the motor imagery (MI) class logits.

**Fig. 8**
Subject-wise test accuracy (%) for three datasets under both within-subject and cross-subject evaluation modes. Panels display results for: (a) BCIC IV-2a (within-subject), (b) BCIC IV-2a (cross-subject), (c) BCIC IV-2b (within-subject), (d) BCIC IV-2b (cross-subject), (e) HGD (within-subject), and (f) HGD (cross-subject). Five models are compared—EEGNet, EEGConformer, CTNet, ATCNet, and the proposed TCFormer. Each bar represents the mean accuracy across multiple runs: five runs for BCIC datasets and three runs for HGD. The black error bars represent the standard deviation across these runs for each subject, reflecting variability in model performance. The tables below each panel summarize the mean accuracy, standard deviation, and average rank across subjects. TCFormer consistently achieves the highest mean accuracy and the lowest average rank, demonstrating superior performance across all datasets.

**Fig. 9**
Average confusion matrices for five models—EEGNet, EEGConformer, CTNet, ATCNet, and the proposed TCFormer—on three EEG motor-imagery datasets: BCIC IV-2a (four‐class MI: feet, left hand, right hand, tongue), BCIC IV-2b (two‐class MI: left hand vs. right hand), and HGD (four‐class movement: feet, left hand, rest, right hand). The matrices correspond to the best-performing run for each model, averaged across all subjects. Darker diagonal entries indicate higher per‐class accuracy and overall model performance.

See this image and copyright information in PMC

References

1. Altaheri, H. et al. Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: a review. Neural Comput. Appl.35, 14681–14722 (2023). - DOI
1. Altaheri, H., Karray, F., Islam, M. M., Raju, S. M. & Karimi, A. H. Bridging brain with foundation models through self-supervised learning. arXiv (2025).
1. Lin, L. et al. An EEG-based cross-subject interpretable CNN for game player expertise level classification. Expert Syst. Appl.237, 121658 (2024). - DOI
1. Tang, M. et al. HMS-TENet: A hierarchical multi-scale topological enhanced network based on EEG and EOG for driver vigilance Estimation. Biomed. Technol.8, 92–103 (2024). - DOI
1. Gao, D. et al. A multi-domain constraint learning system inspired by adaptive cognitive graphs for emotion recognition. Neural Netw.188, 107457 (2025). - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Temporal convolutional transformer for EEG based motor imagery decoding

Affiliations

Temporal convolutional transformer for EEG based motor imagery decoding

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources