Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning

Ju-Hwan Lee¹, Jin-Young Kim¹, Hyoung-Gook Kim²

Affiliations

¹ Department of Intelligent Electronics and Computer Engineering, Chonnam National University, 77 Yongbong-ro, Buk-gu, Gwangju 61186, Republic of Korea.
² Department of Electronic Convergence Engineering, Kwangwoon University, 20 Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea.

PMID: 39451373
PMCID: PMC11504283
DOI: 10.3390/bioengineering11100997

Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning

Ju-Hwan Lee et al. Bioengineering (Basel). 2024.

. 2024 Oct 3;11(10):997.

doi: 10.3390/bioengineering11100997.

Authors

Ju-Hwan Lee¹, Jin-Young Kim¹, Hyoung-Gook Kim²

Affiliations

¹ Department of Intelligent Electronics and Computer Engineering, Chonnam National University, 77 Yongbong-ro, Buk-gu, Gwangju 61186, Republic of Korea.
² Department of Electronic Convergence Engineering, Kwangwoon University, 20 Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea.

PMID: 39451373
PMCID: PMC11504283
DOI: 10.3390/bioengineering11100997

Abstract

Multimodal emotion recognition has emerged as a promising approach to capture the complex nature of human emotions by integrating information from various sources such as physiological signals, visual behavioral cues, and audio-visual content. However, current methods often struggle with effectively processing redundant or conflicting information across modalities and may overlook implicit inter-modal correlations. To address these challenges, this paper presents a novel multimodal emotion recognition framework which integrates audio-visual features with viewers' EEG data to enhance emotion classification accuracy. The proposed approach employs modality-specific encoders to extract spatiotemporal features, which are then aligned through contrastive learning to capture inter-modal relationships. Additionally, cross-modal attention mechanisms are incorporated for effective feature fusion across modalities. The framework, comprising pre-training, fine-tuning, and testing phases, is evaluated on multiple datasets of emotional responses. The experimental results demonstrate that the proposed multimodal approach, which combines audio-visual features with EEG data, is highly effective in recognizing emotions, highlighting its potential for advancing emotion recognition systems.

Keywords: contrastive learning; cross-attention mechanism; emotion recognition; multimodal learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Diagram of a proposed multimodal framework which integrates video, audio, and EEG data for emotion recognition tasks. The framework consists of three main phases: pre-training, fine-tuning, and testing. In the pre-training phase, a modality encoder extracts features and fuses them into a combined embedding using contrastive learning. In the fine-tuning phase, cross-modal attention is utilized to capture interactions between modalities and trained for emotion recogntion. Finally, the test phase is used to obtain predictive results.

**Figure 2**
Illustration of modality-specific encoders for extracting spatiotemporal features from video, audio, and EEG data. Each modality is preprocessed before being input to the corresponding encoder: ViT for video, Vggish for audio, and Conformer for EEG. These encoded features are then processed using a Residual-TCN.

**Figure 3**
MERCL consists of three components. (1) AMCL learns class-specific relationships within the same modality. (2) EMCL aligns representations across different modalities within the same sample. (3) Finally, SMCL minimizes modality gaps by aligning representations of different modalities within the same sample.

**Figure 4**
Illustration of the CMA module between modalities $α$ and $β$ .

See this image and copyright information in PMC

References

1. Andalibi N., Buss J. The human in emotion recognition on social media: Attitudes, outcomes, risks; Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems; Honolulu, HI, USA. 25–30 April 2020; pp. 1–16.
1. Dubey A., Shingala B., Panara J.R., Desai K., Sahana M. Digital Content Recommendation System through Facial Emotion Recognition. Int. J. Res. Appl. Sci. Eng. Technol. 2023;11:1272–1276. doi: 10.22214/ijraset.2023.49225. - DOI
1. Pepa L., Spalazzi L., Capecci M., Ceravolo M.G. Automatic emotion recognition in clinical scenario: A systematic review of methods. IEEE Trans. Affect. Comput. 2021;14:1675–1695. doi: 10.1109/TAFFC.2021.3128787. - DOI
1. Caruelle D., Shams P., Gustafsson A., Lervik-Olsen L. Affective computing in marketing: Practical implications and research opportunities afforded by emotionally intelligent machines. Mark. Lett. 2022;33:163–169. doi: 10.1007/s11002-021-09609-0. - DOI
1. Jafari M., Shoeibi A., Khodatars M., Bagherzadeh S., Shalbaf A., García D.L., Gorriz J.M., Acharya U.R. Emotion recognition in EEG signals using deep learning methods: A review. Comput. Biol. Med. 2023;165:107450. doi: 10.1016/j.compbiomed.2023.107450. - DOI - PubMed

Grants and funding

No. NRF-2023R1A2C1006756/National Research Foundation of Korea(NRF)

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning

Affiliations

Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources