. 2023 Jul 25;13(1):11990.

doi: 10.1038/s41598-023-38868-2.

Speech emotion classification using attention based network and regularized feature selection

Samson Akinpelu^#¹, Serestina Viriri^#²

Affiliations

¹ School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa.
² School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa. viriris@ukzn.ac.za.

^# Contributed equally.

PMID: 37491423
PMCID: PMC10368662
DOI: 10.1038/s41598-023-38868-2

Speech emotion classification using attention based network and regularized feature selection

Samson Akinpelu et al. Sci Rep. 2023.

. 2023 Jul 25;13(1):11990.

doi: 10.1038/s41598-023-38868-2.

Authors

Samson Akinpelu^#¹, Serestina Viriri^#²

Affiliations

¹ School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa.
² School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa. viriris@ukzn.ac.za.

^# Contributed equally.

PMID: 37491423
PMCID: PMC10368662
DOI: 10.1038/s41598-023-38868-2

Abstract

Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human-Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Conventional speech emotion classification system.

**Figure 2**
Structure of mel-spectrogram extraction.

**Figure 3**
Proposed model architecture.

**Figure 4**
Convolutional layers block diagram.

**Figure 6**
Attention-based Vgg16+RNCA+RF.

**Figure 7**
Attention-based Vgg16+RNCA+MLP.

**Figure 8**
Attention-based Vgg16+RNCA+SVM.

**Figure 9**
Attention-based Vgg19+RNCA+RF.

**Figure 10**
Attention-based Vgg19+RNCA+MLP.

**Figure 11**
Attention-based Vgg19+RNCA+SVM.

**Figure 12**
Performance chart with 4 metrics and 3 classifiers.

See this image and copyright information in PMC

Cited by

Advanced differential evolution for gender-aware English speech emotion recognition.
Yue L, Hu P, Zhu J. Yue L, et al. Sci Rep. 2024 Jul 31;14(1):17696. doi: 10.1038/s41598-024-68864-z. Sci Rep. 2024. PMID: 39085418 Free PMC article.
A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions.
Wu Y, Mi Q, Gao T. Wu Y, et al. Biomimetics (Basel). 2025 Jun 27;10(7):418. doi: 10.3390/biomimetics10070418. Biomimetics (Basel). 2025. PMID: 40710231 Free PMC article. Review.
IndoWaveSentiment: Indonesian audio dataset for emotion classification.
Bustamin A, Rizky AM, Warni E, Areni IS, Indrabayu. Bustamin A, et al. Data Brief. 2024 Nov 16;57:111138. doi: 10.1016/j.dib.2024.111138. eCollection 2024 Dec. Data Brief. 2024. PMID: 39687377 Free PMC article.
Heterogeneous fusion of biometric and deep physiological features for accurate porcine cough recognition.
Wang B, Qi J, An X, Wang Y. Wang B, et al. PLoS One. 2024 Feb 1;19(2):e0297655. doi: 10.1371/journal.pone.0297655. eCollection 2024. PLoS One. 2024. PMID: 38300934 Free PMC article.
Integrated visual transformer and flash attention for lip-to-speech generation GAN.
Yang Q, Bai Y, Liu F, Zhang W. Yang Q, et al. Sci Rep. 2024 Feb 24;14(1):4525. doi: 10.1038/s41598-024-55248-6. Sci Rep. 2024. PMID: 38402265 Free PMC article.

See all "Cited by" articles

References

1. Costantini G, Parada-Cabaleiro E, Casali D, Cesarini V. The emotion probe: On the universality of cross-linguistic and cross-gender speech emotion recognition via machine learning. Sensors. 2022 doi: 10.3390/s22072461. - DOI - PMC - PubMed
1. Chimthankar, P. P. Speech Emotion Recognition using Deep Learning. http://norma.ncirl.ie/5142/1/priyankaprashantchimthankar.pdf (2021)
1. Saad, H. F.and Mahmud, Shaheen, M., Hasan, M., Farastu, P. & Kabir, M. Is speech emotion recognition language-independent? Analysis of english and bangla languages using language-independent vocal features. arXiv:2111.10776 (2021)
1. Burghardt GM. A place for emotions in behavior systems research. Behavioural Process. 2019 doi: 10.1016/j.beproc.2019.06.004. - DOI - PubMed
1. Mustaqeem, Kwon S. The emotion probe: On the universality of cross-linguistic and cross-gender speech emotion recognition via machine learning. Appl. Soft Comput. 2021 doi: 10.1016/j.asoc.2021.107101. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Speech emotion classification using attention based network and regularized feature selection

Affiliations

Speech emotion classification using attention based network and regularized feature selection

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Research Materials