Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jun 27;10(7):418.
doi: 10.3390/biomimetics10070418.

A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions

Affiliations
Review

A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions

You Wu et al. Biomimetics (Basel). .

Abstract

This paper presents a comprehensive review of multimodal emotion recognition (MER), a process that integrates multiple data modalities such as speech, visual, and text to identify human emotions. Grounded in biomimetics, the survey frames MER as a bio-inspired sensing paradigm that emulates the way humans seamlessly fuse multisensory cues to communicate affect, thereby transferring principles from living systems to engineered solutions. By leveraging various modalities, MER systems offer a richer and more robust analysis of emotional states compared to unimodal approaches. The review covers the general structure of MER systems, feature extraction techniques, and multimodal information fusion strategies, highlighting key advancements and milestones. Additionally, it addresses the research challenges and open issues in MER, including lightweight models, cross-corpus generalizability, and the incorporation of additional modalities. The paper concludes by discussing future directions aimed at improving the accuracy, explainability, and practicality of MER systems for real-world applications.

Keywords: emotion analysis; feature extraction; information fusion; multimodal emotion recognition.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
PRISMA flow diagram.
Figure 2
Figure 2
Record counts at each PRISMA stage.
Figure 3
Figure 3
Distribution of modality types in 103 MER studies.
Figure 4
Figure 4
Annual trend of modality adoption, 2011–2025.
Figure 5
Figure 5
Word cloud of modality keyword frequencies, 2011–2025.
Figure 6
Figure 6
Prevalence of modalities by lead author country.
Figure 7
Figure 7
The workflow of the MER system.

Similar articles

Cited by

References

    1. Abdullah S.M.S.A., Ameen S.Y.A., Sadeeq M.A., Zeebaree S. Multimodal emotion recognition using deep learning. J. Appl. Sci. Technol. Trends. 2021;2:73–79. doi: 10.38094/jastt20291. - DOI
    1. Adel O., Fathalla K.M., Abo ElFarag A. MM-EMOR: Multi-modal emotion recognition of social media using concatenated deep learning networks. Big Data Cogn. Comput. 2023;7:164. doi: 10.3390/bdcc7040164. - DOI
    1. Bahreini K., Nadolski R., Westera W. Towards multimodal emotion recognition in e-learning environments. Interact. Learn. Environ. 2016;24:590–605. doi: 10.1080/10494820.2014.908927. - DOI
    1. Ghaleb E., Popa M., Asteriadis S. Multimodal and temporal perception of audio-visual cues for emotion recognition; Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII); Cambridge, UK. 3–6 September 2019; pp. 552–558.
    1. He Z., Li Z., Yang F., Wang L., Li J., Zhou C., Pan J. Advances in multimodal emotion recognition based on brain–computer interfaces. Brain Sci. 2020;10:687. doi: 10.3390/brainsci10100687. - DOI - PMC - PubMed

LinkOut - more resources