Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 26;15(1):32959.
doi: 10.1038/s41598-025-16219-7.

Temporal convolutional transformer for EEG based motor imagery decoding

Affiliations

Temporal convolutional transformer for EEG based motor imagery decoding

Hamdi Altaheri et al. Sci Rep. .

Abstract

Brain-computer interfaces (BCIs) based on motor imagery (MI) offer a transformative pathway for rehabilitation, communication, and control by translating imagined movements into actionable commands. However, accurately decoding motor imagery from electroencephalography (EEG) signals remains a significant challenge in BCI research. In this paper, we propose TCFormer, a temporal convolutional Transformer designed to improve the performance of EEG-based motor imagery decoding. TCFormer integrates a multi-kernel convolutional neural network (MK-CNN) for spatial-temporal feature extraction with a Transformer encoder enhanced by grouped query attention to capture global contextual dependencies. A temporal convolutional network (TCN) head follows, utilizing dilated causal convolutions to enable the model to learn long-range temporal patterns and generate final class predictions. The architecture is evaluated on three benchmark motor imagery and motor execution EEG datasets: BCIC IV-2a, BCIC IV-2b, and HGD, achieving average accuracies of 84.79, 87.71, and 96.27%, respectively, outperforming current methods. These results demonstrate the effectiveness of the integrated design in addressing the inherent complexity of EEG signals. The code is publicly available at https://github.com/altaheri/TCFormer .

Keywords: Brain signal decoding; Convolutional neural network; Electroencephalography (EEG); Grouped query attention; Motor imagery classification; Temporal convolutional network; Transformers.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
TCFormer architecture, comprising a convolutional module, a Transformer encoder, and a TCN, followed by a classifier. The convolutional module performs multi-kernel temporal filtering and spatial (depth-wise) filtering to extract multi-scale EEG features. The Transformer encoder, composed of formula image stacked layers, applies grouped-query attention (GQA) and feed-forward (FF) sublayers to model global dependencies, with rotary positional embeddings (RoPE) providing temporal context. The TCN module utilizes causal dilated convolutions to capture local temporal patterns and sequential dependencies that are not captured by the Transformer and produce the final class logits. Conv convolution, BN batch norm, ELU exponential linear unit.
Fig. 2
Fig. 2
Feature-extraction pipeline: raw EEG is first processed by a multi-kernel convolutional block to capture diverse temporal scales, then passed through a Transformer encoder with grouped-query attention to model global dependencies, yielding the encoded feature tensor that feeds into the TCN classification head.
Fig. 3
Fig. 3
Grouped squeeze-and-excitation (SE) attention in the multi-kernel convolution block. The input feature map formula image is globally average-pooled along the temporal dimension to produce a channel descriptor of shape formula image. This descriptor is passed through two formula image grouped convolutions, with reduction ratio formula image, followed by ReLU and sigmoid activations, resulting in formula image per-group attention weights. These weights, each in the range formula image are broadcast across their corresponding formula image channels and used to rescale the original input. The reweighted features are then added to the residual path to produce the final output formula image.
Fig. 4
Fig. 4
Comparison of attention mechanisms. (Left) multi-head attention (MHA): each query head has its own key- and value-projection matrices. (Middle) multi-query attention (MQA): all heads share a single key-value pair, minimizing memory cost. (Right) grouped-query attention (GQA): query heads are divided into groups, with each group sharing a key-value pair—providing a trade-off between the expressiveness of MHA and the efficiency of MQA. GQA reduces to MQA when formula image, and becomes equivalent to MHA when formula image.
Fig. 5
Fig. 5
Grouped-query attention (GQA).
Fig. 6
Fig. 6
The temporal convolutional network.
Fig. 7
Fig. 7
Temporal convolutional network (TCN) head: the feature sequenceformula image, generated by the MK-CNN and Transformer encoder (as shown in Fig. 2), is processed through two residual TCN blocks. Each block employs dilated causal convolutions with a kernel size of formula image and dilation rates of formula image1 and formula image, respectively, resulting in a total receptive field of 19 time steps. The output at the final time step—a vector of size formula image—is passed to a convolutional classification layer to produce the motor imagery (MI) class logits.
Fig. 8
Fig. 8
Subject-wise test accuracy (%) for three datasets under both within-subject and cross-subject evaluation modes. Panels display results for: (a) BCIC IV-2a (within-subject), (b) BCIC IV-2a (cross-subject), (c) BCIC IV-2b (within-subject), (d) BCIC IV-2b (cross-subject), (e) HGD (within-subject), and (f) HGD (cross-subject). Five models are compared—EEGNet, EEGConformer, CTNet, ATCNet, and the proposed TCFormer. Each bar represents the mean accuracy across multiple runs: five runs for BCIC datasets and three runs for HGD. The black error bars represent the standard deviation across these runs for each subject, reflecting variability in model performance. The tables below each panel summarize the mean accuracy, standard deviation, and average rank across subjects. TCFormer consistently achieves the highest mean accuracy and the lowest average rank, demonstrating superior performance across all datasets.
Fig. 9
Fig. 9
Average confusion matrices for five models—EEGNet, EEGConformer, CTNet, ATCNet, and the proposed TCFormer—on three EEG motor-imagery datasets: BCIC IV-2a (four‐class MI: feet, left hand, right hand, tongue), BCIC IV-2b (two‐class MI: left hand vs. right hand), and HGD (four‐class movement: feet, left hand, rest, right hand). The matrices correspond to the best-performing run for each model, averaged across all subjects. Darker diagonal entries indicate higher per‐class accuracy and overall model performance.

References

    1. Altaheri, H. et al. Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: a review. Neural Comput. Appl.35, 14681–14722 (2023). - DOI
    1. Altaheri, H., Karray, F., Islam, M. M., Raju, S. M. & Karimi, A. H. Bridging brain with foundation models through self-supervised learning. arXiv (2025).
    1. Lin, L. et al. An EEG-based cross-subject interpretable CNN for game player expertise level classification. Expert Syst. Appl.237, 121658 (2024). - DOI
    1. Tang, M. et al. HMS-TENet: A hierarchical multi-scale topological enhanced network based on EEG and EOG for driver vigilance Estimation. Biomed. Technol.8, 92–103 (2024). - DOI
    1. Gao, D. et al. A multi-domain constraint learning system inspired by adaptive cognitive graphs for emotion recognition. Neural Netw.188, 107457 (2025). - DOI - PubMed

LinkOut - more resources