Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 17;20(18):5328.
doi: 10.3390/s20185328.

FusionSense: Emotion Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking Neural Network

Affiliations

FusionSense: Emotion Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking Neural Network

Clarence Tan et al. Sensors (Basel). .

Abstract

Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.

Keywords: Evolving Spiking Neural Networks (eSNNs); NeuCube; Spatio-temporal data; facial emotion recognition; multimodal data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Example of face detection in Mahnob-HCI showing the feature points tracked along the video.
Figure 2
Figure 2
Facial landmarks detection.
Figure 3
Figure 3
Facial features.
Figure 4
Figure 4
Elicited signal features in the last 30 seconds of video.
Figure 5
Figure 5
Boxplot for features in Mahnob-HCI dataset for valence emotional dimension.
Figure 6
Figure 6
Proposed method for emotion valence classification using NeuCube.
Figure 7
Figure 7
Encoding Continuous feature values to five neurons spiking.
Figure 8
Figure 8
Input neurons location for facial and peripheral features classification. n1 means for the neuron coding the lowest values and n5 the highest ones for each feature. Note there are 3 layers of input neuron in the cube, located at z=30 (facial), z=0 (peripheral), and z=30 (facial).
Figure 9
Figure 9
Leaky integrate-and-fire model (LIFM) neuron model. Small circles at neuron inputs represent connection weights. Note that input 1 has a bigger weight and it produces a larger effect in PSP.
Figure 10
Figure 10
Hebbian Learning rule, connection (synaptic modification) vs difference between post- and pre-synaptic times.
Figure 11
Figure 11
Neuron activity pattern example when NeuCube is trained using each Separate data (low and high valence).

Similar articles

Cited by

References

    1. Calvo R.A., D’Mello S. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 2010;1:18–37. doi: 10.1109/T-AFFC.2010.1. - DOI
    1. Edwards J., Jackson H.J., Pattison P.E. Emotion recognition via facial expression and affective prosody in schizophrenia: A methodological review. Clin. Psychol. Rev. 2002;22:789–832. doi: 10.1016/S0272-7358(02)00130-7. - DOI - PubMed
    1. Fong T., Nourbakhsh I., Dautenhahn K. A survey of socially interactive robots. Robot. Auton. Syst. 2003;42:143–166. doi: 10.1016/S0921-8890(02)00372-X. - DOI
    1. Russell J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980;39:1161. doi: 10.1037/h0077714. - DOI
    1. Gunes H., Schuller B., Pantic M., Cowie R. Emotion representation, analysis and synthesis in continuous space: A survey; Proceedings of the Face and Gesture 2011; Santa Barbara, CA, USA. 21–25 March 2011; pp. 827–834.