Effect on speech emotion classification of a feature selection approach using a convolutional neural network
- PMID: 34805511
- PMCID: PMC8576551
- DOI: 10.7717/peerj-cs.766
Effect on speech emotion classification of a feature selection approach using a convolutional neural network
Abstract
Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.
Keywords: Convolutional neural network; Data augmentation; Feature extraction; Feature selection; Mel-spectrogram; Speech emotion recognition.
© 2021 Amjad et al.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures









Similar articles
-
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008. Sensors (Basel). 2020. PMID: 33113907 Free PMC article.
-
A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech.Sensors (Basel). 2022 Oct 6;22(19):7561. doi: 10.3390/s22197561. Sensors (Basel). 2022. PMID: 36236658 Free PMC article.
-
Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.Front Physiol. 2021 Mar 2;12:643202. doi: 10.3389/fphys.2021.643202. eCollection 2021. Front Physiol. 2021. PMID: 33737889 Free PMC article.
-
Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network.Multimed Tools Appl. 2022;81(21):31107-31128. doi: 10.1007/s11042-022-12886-0. Epub 2022 Apr 8. Multimed Tools Appl. 2022. PMID: 35431609 Free PMC article.
-
Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives.Front Neurorobot. 2021 Nov 29;15:784514. doi: 10.3389/fnbot.2021.784514. eCollection 2021. Front Neurorobot. 2021. PMID: 34912204 Free PMC article. Review.
Cited by
-
Research on the Filtering and Classification Method of Interactive Music Education Resources Based on Neural Network.Comput Intell Neurosci. 2022 Aug 17;2022:5764148. doi: 10.1155/2022/5764148. eCollection 2022. Comput Intell Neurosci. 2022. Retraction in: Comput Intell Neurosci. 2023 Oct 18;2023:9819152. doi: 10.1155/2023/9819152. PMID: 36035856 Free PMC article. Retracted.
-
Migraine headache (MH) classification using machine learning methods with data augmentation.Sci Rep. 2024 Mar 2;14(1):5180. doi: 10.1038/s41598-024-55874-0. Sci Rep. 2024. PMID: 38431729 Free PMC article.
-
Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition.PeerJ Comput Sci. 2022 Aug 3;8:e1053. doi: 10.7717/peerj-cs.1053. eCollection 2022. PeerJ Comput Sci. 2022. PMID: 36091976 Free PMC article.
-
Multi-class sentiment analysis of urdu text using multilingual BERT.Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9. Sci Rep. 2022. PMID: 35361890 Free PMC article.
-
Emotion recognition for human-computer interaction using high-level descriptors.Sci Rep. 2024 May 27;14(1):12122. doi: 10.1038/s41598-024-59294-y. Sci Rep. 2024. PMID: 38802373 Free PMC article.
References
-
- Abdel-Hamid O, Mohamed A-R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2014;22(10):1533–1545. doi: 10.1109/TASLP.2014.2339736. - DOI
-
- Alonso JB, Cabrera J, Medina M, Travieso CM. New approach in quantification of emotional intensity from the speech signal: emotional temperature. Expert Systems with Applications. 2015;42(24):9554–9564. doi: 10.1016/j.eswa.2015.07.062. - DOI
-
- Alreshidi A, Ullah M. Facial emotion recognition using hybrid features. Informatics. 2020;7(1):6. doi: 10.3390/informatics7010006. - DOI
-
- Anagnostopoulos C-N, Iliou T, Giannoukos I. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review. 2015;43(2):155–177. doi: 10.1007/s10462-012-9368-5. - DOI
-
- Badshah AM, Ahmad J, Rahim N, Baik SW. Speech emotion recognition from spectrograms with deep convolutional neural network. 2017 International Conference on Platform Technology and Service (PlatCon); 2017. pp. 1–5.
LinkOut - more resources
Full Text Sources
Miscellaneous