Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 5;25(15):4819.
doi: 10.3390/s25154819.

MB-MSTFNet: A Multi-Band Spatio-Temporal Attention Network for EEG Sensor-Based Emotion Recognition

Affiliations

MB-MSTFNet: A Multi-Band Spatio-Temporal Attention Network for EEG Sensor-Based Emotion Recognition

Cheng Fang et al. Sensors (Basel). .

Abstract

Emotion analysis based on electroencephalogram (EEG) sensors is pivotal for human-machine interaction yet faces key challenges in spatio-temporal feature fusion and cross-band and brain-region integration from multi-channel sensor-derived signals. This paper proposes MB-MSTFNet, a novel framework for EEG emotion recognition. The model constructs a 3D tensor to encode band-space-time correlations of sensor data, explicitly modeling frequency-domain dynamics and spatial distributions of EEG sensors across brain regions. A multi-scale CNN-Inception module extracts hierarchical spatial features via diverse convolutional kernels and pooling operations, capturing localized sensor activations and global brain network interactions. Bi-directional GRUs (BiGRUs) model temporal dependencies in sensor time-series, adept at capturing long-range dynamic patterns. Multi-head self-attention highlights critical time windows and brain regions by assigning adaptive weights to relevant sensor channels, suppressing noise from non-contributory electrodes. Experiments on the DEAP dataset, containing multi-channel EEG sensor recordings, show that MB-MSTFNet achieves 96.80 ± 0.92% valence accuracy, 98.02 ± 0.76% arousal accuracy for binary classification tasks, and 92.85 ± 1.45% accuracy for four-class classification. Ablation studies validate that feature fusion, bidirectional temporal modeling, and multi-scale mechanisms significantly enhance performance by improving feature complementarity. This sensor-driven framework advances affective computing by integrating spatio-temporal dynamics and multi-band interactions of EEG sensor signals, enabling efficient real-time emotion recognition.

Keywords: Inception module; bidirectional gated recurrent unit (BiGRU); convolutional neural network (CNN); electroencephalograph (EEG); emotion signal recognition; multi-head attention (MHA).

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1
The framework of the proposed EEG-based emotion recognition model MB-MSTFNet.
Figure 2
Figure 2
DEAP Dataset Electrode 2D Mapping.
Figure 3
Figure 3
Three-dimensional tensor for EEG spatial-frequency representation.
Figure 4
Figure 4
CNN-Inception for multi-scale spatial extraction.
Figure 5
Figure 5
BiGRU Bidirectional Temporal Modeling.
Figure 6
Figure 6
Multi-head attention for key emotional feature capture.
Figure 7
Figure 7
Emotion recognition accuracy in 32 subjects.
Figure 8
Figure 8
Distribution of DE, PSD, and combined features in valence–arousal space.
Figure 9
Figure 9
Multi-band EEG feature topographic maps for group average.
Figure 10
Figure 10
Cross-band EEG Feature maps of inception multi-scale branches for group average.
Figure 11
Figure 11
Differential temporal correlation signatures of GRUs for low and high arousal recognition.

Similar articles

References

    1. Xu W., Jiang H., Liang X. Leveraging Knowledge of Modality Experts for Incomplete Multimodal Learning; Proceedings of the 32nd ACM International Conference on Multimedia; Melbourne, Australia. 28 October–1 November 2024; pp. 438–446.
    1. Zheng W.L., Liu W., Lu Y., Lu B.L., Cichocki A. EmotionMeter: A Multimodal Framework for Recognizing Human Emotions. IEEE Trans. Cybern. 2019;49:1110–1122. doi: 10.1109/TCYB.2018.2797176. - DOI - PubMed
    1. Zhang W., Qiu F., Wang S., Zeng H., Zhang Z., An R., Ma B., Ding Y. Transformer-based multimodal information fusion for facial expression analysis; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; New Orleans, LA, USA. 18–24 June 2022; pp. 2428–2437.
    1. Zhuang X., Liu F., Hou J., Hao J., Cai X. Transformer-based interactive multi-modal attention network for video sentiment detection. Neural Process. Lett. 2022;54:1943–1960. doi: 10.1007/s11063-021-10713-5. - DOI
    1. Jafari M., Shoeibi A., Khodatars M., Bagherzadeh S., Shalbaf A., García D.L., Gorriz J.M., Acharya U.R. Emotion recognition in EEG signals using deep learning methods: A review. Comput. Biol. Med. 2023;165:107450. doi: 10.1016/j.compbiomed.2023.107450. - DOI - PubMed

LinkOut - more resources