Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 14;25(16):5066.
doi: 10.3390/s25165066.

Multi-Scale Temporal Fusion Network for Real-Time Multimodal Emotion Recognition in IoT Environments

Affiliations

Multi-Scale Temporal Fusion Network for Real-Time Multimodal Emotion Recognition in IoT Environments

Sungwook Yoon et al. Sensors (Basel). .

Abstract

This paper introduces EmotionTFN (Emotion-Multi-Scale Temporal Fusion Network), a novel hierarchical temporal fusion architecture that addresses key challenges in IoT emotion recognition by processing diverse sensor data while maintaining accuracy across multiple temporal scales. The architecture integrates physiological signals (EEG, PPG, and GSR), visual, and audio data using hierarchical temporal attention across short-term (0.5-2 s), medium-term (2-10 s), and long-term (10-60 s) windows. Edge computing optimizations, including model compression, quantization, and adaptive sampling, enable deployment on resource-constrained devices. Extensive experiments on MELD, DEAP, and G-REx datasets demonstrate 94.2% accuracy on discrete emotion classification and 0.087 mean absolute error on dimensional prediction, outperforming the best baseline (87.4%). The system maintains sub-200 ms latency on IoT hardware while achieving a 40% improvement in energy efficiency. Real-world deployment validation over four weeks achieved 97.2% uptime and user satisfaction scores of 4.1/5.0 while ensuring privacy through local processing.

Keywords: Internet of Things; edge computing; emotion recognition; multimodal fusion; real-time processing; temporal attention.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflicts of interest.

Figures

Figure 1
Figure 1
EmotionTFN system architecture overview.

Similar articles

References

    1. Cai Y., Genovese A., Piuri V., Scotti F., Siegel M. IoT-Based Architectures for Sensing and Local Data Processing in Ambient Intelligence: Research and Industrial Trends; Proceedings of the 2019 IEEE International Instrumentation and Measurement Technology Conference (I2MTC); Auckland, New Zealand. 20–23 May 2019; pp. 1–6. - DOI
    1. Picard R.W. Affective Computing. MIT Press; Cambridge, MA, USA: 1997.
    1. Pan J., Li X., Li J., Li L., Wang S., Wang H., He H. Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG Using Deep Learning. Sensors. 2023;23:761. doi: 10.3390/s23020761. - DOI - PMC - PubMed
    1. Kaklauskas A., Abraham A., Milevicius V., Ubarte I., Perov S., Dzemyda G., Kurasova O., Kersys A., Rute J., Maciuliene M., et al. A Review of AI Cloud and Edge Sensors, Methods, and Applications for the Recognition of Emotional, Affective and Physiological States. Sensors. 2022;22:7843. doi: 10.3390/s22207824. - DOI - PMC - PubMed
    1. Russell J.A. A circumplex model of affect. J. Pers. Soc. Psychol. 1980;39:1161–1178. doi: 10.1037/h0077714. - DOI

LinkOut - more resources