. 2021 Nov 3:7:e766.

doi: 10.7717/peerj-cs.766. eCollection 2021.

Effect on speech emotion classification of a feature selection approach using a convolutional neural network

Ammar Amjad¹, Lal Khan¹, Hsien-Tsung Chang^{1

2

3

4}

Affiliations

¹ Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.
² Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
³ Artificial Intelligence Research Center, Chang Gung University, Taoyuan, Taiwan.
⁴ Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan, Taiwan.

PMID: 34805511
PMCID: PMC8576551
DOI: 10.7717/peerj-cs.766

Effect on speech emotion classification of a feature selection approach using a convolutional neural network

Ammar Amjad et al. PeerJ Comput Sci. 2021.

. 2021 Nov 3:7:e766.

doi: 10.7717/peerj-cs.766. eCollection 2021.

Authors

Ammar Amjad¹, Lal Khan¹, Hsien-Tsung Chang^{1

2

3

4}

Affiliations

¹ Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.
² Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
³ Artificial Intelligence Research Center, Chang Gung University, Taoyuan, Taiwan.
⁴ Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan, Taiwan.

PMID: 34805511
PMCID: PMC8576551
DOI: 10.7717/peerj-cs.766

Abstract

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.

Keywords: Convolutional neural network; Data augmentation; Feature extraction; Feature selection; Mel-spectrogram; Speech emotion recognition.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1. The structure of our proposed model for audio emotion recognition.**

**Figure 2. The general architecture of AlexNet, The parameters of the convolutional layer are represented by the “Conv(kernel size)-[stride size]-[number of channels]”.**
The parameters of themax-pooling layer are indicated as “Maxpool-[kernel size]-[stride size]”.

**Figure 3. Confusion matrix obtained by the SVM on the Emo-DB database for the SD experiment.**

**Figure 4. Confusion matrix obtained by the SVM on the SAVEE database for the SD experiment.**

**Figure 5. Confusion matrix obtained by the SVM on the RAVDESS database for the SD experiment.**

**Figure 6. Confusion matrix obtained by the MLP on the IEMOCAP database for the SD experiment.**

**Figure 7. Confusion matrix obtained by the SVM on the RAVDESS database for the SI experiment.**

**Figure 8. Confusion matrix obtained by the MLP on the RAVDESS database for the SI experiment.**

**Figure 9. Confusion matrix obtained by the SVM on the IEMOCAP database for the SI experiment.**

See this image and copyright information in PMC

Cited by

Research on the Filtering and Classification Method of Interactive Music Education Resources Based on Neural Network.
Xue B, Song Y. Xue B, et al. Comput Intell Neurosci. 2022 Aug 17;2022:5764148. doi: 10.1155/2022/5764148. eCollection 2022. Comput Intell Neurosci. 2022. Retraction in: Comput Intell Neurosci. 2023 Oct 18;2023:9819152. doi: 10.1155/2023/9819152. PMID: 36035856 Free PMC article. Retracted.
Migraine headache (MH) classification using machine learning methods with data augmentation.
Khan L, Shahreen M, Qazi A, Jamil Ahmed Shah S, Hussain S, Chang HT. Khan L, et al. Sci Rep. 2024 Mar 2;14(1):5180. doi: 10.1038/s41598-024-55874-0. Sci Rep. 2024. PMID: 38431729 Free PMC article.
Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition.
Amjad A, Khan L, Chang HT. Amjad A, et al. PeerJ Comput Sci. 2022 Aug 3;8:e1053. doi: 10.7717/peerj-cs.1053. eCollection 2022. PeerJ Comput Sci. 2022. PMID: 36091976 Free PMC article.
Multi-class sentiment analysis of urdu text using multilingual BERT.
Khan L, Amjad A, Ashraf N, Chang HT. Khan L, et al. Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9. Sci Rep. 2022. PMID: 35361890 Free PMC article.
Emotion recognition for human-computer interaction using high-level descriptors.
Singla C, Singh S, Sharma P, Mittal N, Gared F. Singla C, et al. Sci Rep. 2024 May 27;14(1):12122. doi: 10.1038/s41598-024-59294-y. Sci Rep. 2024. PMID: 38802373 Free PMC article.

See all "Cited by" articles

References

1. Abdel-Hamid O, Mohamed A-R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2014;22(10):1533–1545. doi: 10.1109/TASLP.2014.2339736. - DOI
1. Alonso JB, Cabrera J, Medina M, Travieso CM. New approach in quantification of emotional intensity from the speech signal: emotional temperature. Expert Systems with Applications. 2015;42(24):9554–9564. doi: 10.1016/j.eswa.2015.07.062. - DOI
1. Alreshidi A, Ullah M. Facial emotion recognition using hybrid features. Informatics. 2020;7(1):6. doi: 10.3390/informatics7010006. - DOI
1. Anagnostopoulos C-N, Iliou T, Giannoukos I. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review. 2015;43(2):155–177. doi: 10.1007/s10462-012-9368-5. - DOI
1. Badshah AM, Ahmad J, Rahim N, Baik SW. Speech emotion recognition from spectrograms with deep convolutional neural network. 2017 International Conference on Platform Technology and Service (PlatCon); 2017. pp. 1–5.

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Effect on speech emotion classification of a feature selection approach using a convolutional neural network

Affiliations

Effect on speech emotion classification of a feature selection approach using a convolutional neural network

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Miscellaneous