Speech emotion classification using attention based network and regularized feature selection
- PMID: 37491423
- PMCID: PMC10368662
- DOI: 10.1038/s41598-023-38868-2
Speech emotion classification using attention based network and regularized feature selection
Abstract
Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human-Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.
© 2023. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures














Similar articles
-
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008. Sensors (Basel). 2020. PMID: 33113907 Free PMC article.
-
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019. PLoS One. 2019. PMID: 31415592 Free PMC article.
-
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021. PeerJ Comput Sci. 2021. PMID: 34805511 Free PMC article.
-
Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest.PLoS One. 2023 Nov 21;18(11):e0291500. doi: 10.1371/journal.pone.0291500. eCollection 2023. PLoS One. 2023. PMID: 37988352 Free PMC article.
-
Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives.Front Neurorobot. 2021 Nov 29;15:784514. doi: 10.3389/fnbot.2021.784514. eCollection 2021. Front Neurorobot. 2021. PMID: 34912204 Free PMC article. Review.
Cited by
-
Advanced differential evolution for gender-aware English speech emotion recognition.Sci Rep. 2024 Jul 31;14(1):17696. doi: 10.1038/s41598-024-68864-z. Sci Rep. 2024. PMID: 39085418 Free PMC article.
-
A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions.Biomimetics (Basel). 2025 Jun 27;10(7):418. doi: 10.3390/biomimetics10070418. Biomimetics (Basel). 2025. PMID: 40710231 Free PMC article. Review.
-
IndoWaveSentiment: Indonesian audio dataset for emotion classification.Data Brief. 2024 Nov 16;57:111138. doi: 10.1016/j.dib.2024.111138. eCollection 2024 Dec. Data Brief. 2024. PMID: 39687377 Free PMC article.
-
Heterogeneous fusion of biometric and deep physiological features for accurate porcine cough recognition.PLoS One. 2024 Feb 1;19(2):e0297655. doi: 10.1371/journal.pone.0297655. eCollection 2024. PLoS One. 2024. PMID: 38300934 Free PMC article.
-
Integrated visual transformer and flash attention for lip-to-speech generation GAN.Sci Rep. 2024 Feb 24;14(1):4525. doi: 10.1038/s41598-024-55248-6. Sci Rep. 2024. PMID: 38402265 Free PMC article.
References
-
- Chimthankar, P. P. Speech Emotion Recognition using Deep Learning. http://norma.ncirl.ie/5142/1/priyankaprashantchimthankar.pdf (2021)
-
- Saad, H. F.and Mahmud, Shaheen, M., Hasan, M., Farastu, P. & Kabir, M. Is speech emotion recognition language-independent? Analysis of english and bangla languages using language-independent vocal features. arXiv:2111.10776 (2021)
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials