Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 3;11(10):997.
doi: 10.3390/bioengineering11100997.

Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning

Affiliations

Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning

Ju-Hwan Lee et al. Bioengineering (Basel). .

Abstract

Multimodal emotion recognition has emerged as a promising approach to capture the complex nature of human emotions by integrating information from various sources such as physiological signals, visual behavioral cues, and audio-visual content. However, current methods often struggle with effectively processing redundant or conflicting information across modalities and may overlook implicit inter-modal correlations. To address these challenges, this paper presents a novel multimodal emotion recognition framework which integrates audio-visual features with viewers' EEG data to enhance emotion classification accuracy. The proposed approach employs modality-specific encoders to extract spatiotemporal features, which are then aligned through contrastive learning to capture inter-modal relationships. Additionally, cross-modal attention mechanisms are incorporated for effective feature fusion across modalities. The framework, comprising pre-training, fine-tuning, and testing phases, is evaluated on multiple datasets of emotional responses. The experimental results demonstrate that the proposed multimodal approach, which combines audio-visual features with EEG data, is highly effective in recognizing emotions, highlighting its potential for advancing emotion recognition systems.

Keywords: contrastive learning; cross-attention mechanism; emotion recognition; multimodal learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Diagram of a proposed multimodal framework which integrates video, audio, and EEG data for emotion recognition tasks. The framework consists of three main phases: pre-training, fine-tuning, and testing. In the pre-training phase, a modality encoder extracts features and fuses them into a combined embedding using contrastive learning. In the fine-tuning phase, cross-modal attention is utilized to capture interactions between modalities and trained for emotion recogntion. Finally, the test phase is used to obtain predictive results.
Figure 2
Figure 2
Illustration of modality-specific encoders for extracting spatiotemporal features from video, audio, and EEG data. Each modality is preprocessed before being input to the corresponding encoder: ViT for video, Vggish for audio, and Conformer for EEG. These encoded features are then processed using a Residual-TCN.
Figure 3
Figure 3
MERCL consists of three components. (1) AMCL learns class-specific relationships within the same modality. (2) EMCL aligns representations across different modalities within the same sample. (3) Finally, SMCL minimizes modality gaps by aligning representations of different modalities within the same sample.
Figure 4
Figure 4
Illustration of the CMA module between modalities α and β.

Similar articles

Cited by

References

    1. Andalibi N., Buss J. The human in emotion recognition on social media: Attitudes, outcomes, risks; Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems; Honolulu, HI, USA. 25–30 April 2020; pp. 1–16.
    1. Dubey A., Shingala B., Panara J.R., Desai K., Sahana M. Digital Content Recommendation System through Facial Emotion Recognition. Int. J. Res. Appl. Sci. Eng. Technol. 2023;11:1272–1276. doi: 10.22214/ijraset.2023.49225. - DOI
    1. Pepa L., Spalazzi L., Capecci M., Ceravolo M.G. Automatic emotion recognition in clinical scenario: A systematic review of methods. IEEE Trans. Affect. Comput. 2021;14:1675–1695. doi: 10.1109/TAFFC.2021.3128787. - DOI
    1. Caruelle D., Shams P., Gustafsson A., Lervik-Olsen L. Affective computing in marketing: Practical implications and research opportunities afforded by emotionally intelligent machines. Mark. Lett. 2022;33:163–169. doi: 10.1007/s11002-021-09609-0. - DOI
    1. Jafari M., Shoeibi A., Khodatars M., Bagherzadeh S., Shalbaf A., García D.L., Gorriz J.M., Acharya U.R. Emotion recognition in EEG signals using deep learning methods: A review. Comput. Biol. Med. 2023;165:107450. doi: 10.1016/j.compbiomed.2023.107450. - DOI - PubMed

LinkOut - more resources