A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions
- PMID: 40710231
- PMCID: PMC12292624
- DOI: 10.3390/biomimetics10070418
A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions
Abstract
This paper presents a comprehensive review of multimodal emotion recognition (MER), a process that integrates multiple data modalities such as speech, visual, and text to identify human emotions. Grounded in biomimetics, the survey frames MER as a bio-inspired sensing paradigm that emulates the way humans seamlessly fuse multisensory cues to communicate affect, thereby transferring principles from living systems to engineered solutions. By leveraging various modalities, MER systems offer a richer and more robust analysis of emotional states compared to unimodal approaches. The review covers the general structure of MER systems, feature extraction techniques, and multimodal information fusion strategies, highlighting key advancements and milestones. Additionally, it addresses the research challenges and open issues in MER, including lightweight models, cross-corpus generalizability, and the incorporation of additional modalities. The paper concludes by discussing future directions aimed at improving the accuracy, explainability, and practicality of MER systems for real-world applications.
Keywords: emotion analysis; feature extraction; information fusion; multimodal emotion recognition.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures







Similar articles
-
EEG-based affective brain-computer interfaces: recent advancements and future challenges.J Neural Eng. 2025 Jun 27;22(3). doi: 10.1088/1741-2552/ade290. J Neural Eng. 2025. PMID: 40490007 Review.
-
UMEDNet: a multimodal approach for emotion detection in the Urdu language.PeerJ Comput Sci. 2025 May 1;11:e2861. doi: 10.7717/peerj-cs.2861. eCollection 2025. PeerJ Comput Sci. 2025. PMID: 40567795 Free PMC article.
-
Accreditation through the eyes of nurse managers: an infinite staircase or a phenomenon that evaporates like water.J Health Organ Manag. 2025 Jun 30. doi: 10.1108/JHOM-01-2025-0029. Online ahead of print. J Health Organ Manag. 2025. PMID: 40574247
-
Feature and classifier-level domain adaptation in DistilHuBERT for cross-corpus speech emotion recognition.Comput Biol Med. 2025 Aug;194:110510. doi: 10.1016/j.compbiomed.2025.110510. Epub 2025 Jun 6. Comput Biol Med. 2025. PMID: 40482556
-
A Systematic Review for Human EEG Brain Signals Based Emotion Classification, Feature Extraction, Brain Condition, Group Comparison.J Med Syst. 2018 Jul 24;42(9):162. doi: 10.1007/s10916-018-1020-8. J Med Syst. 2018. PMID: 30043178
Cited by
-
Multi-Scale Temporal Fusion Network for Real-Time Multimodal Emotion Recognition in IoT Environments.Sensors (Basel). 2025 Aug 14;25(16):5066. doi: 10.3390/s25165066. Sensors (Basel). 2025. PMID: 40871929 Free PMC article.
References
-
- Abdullah S.M.S.A., Ameen S.Y.A., Sadeeq M.A., Zeebaree S. Multimodal emotion recognition using deep learning. J. Appl. Sci. Technol. Trends. 2021;2:73–79. doi: 10.38094/jastt20291. - DOI
-
- Adel O., Fathalla K.M., Abo ElFarag A. MM-EMOR: Multi-modal emotion recognition of social media using concatenated deep learning networks. Big Data Cogn. Comput. 2023;7:164. doi: 10.3390/bdcc7040164. - DOI
-
- Bahreini K., Nadolski R., Westera W. Towards multimodal emotion recognition in e-learning environments. Interact. Learn. Environ. 2016;24:590–605. doi: 10.1080/10494820.2014.908927. - DOI
-
- Ghaleb E., Popa M., Asteriadis S. Multimodal and temporal perception of audio-visual cues for emotion recognition; Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII); Cambridge, UK. 3–6 September 2019; pp. 552–558.
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous