Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Ammar Amjad¹, Lal Khan¹, Hsien-Tsung Chang^{1

2

3

4}

Affiliations

¹ Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.
² Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyaun, Taiwan.
³ Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
⁴ Artificial Intelligence Research Center, Chang Gung University, Taoyuan, Taiwan.

PMID: 36091976
PMCID: PMC9454772
DOI: 10.7717/peerj-cs.1053

Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Ammar Amjad et al. PeerJ Comput Sci. 2022.

. 2022 Aug 3:8:e1053.

doi: 10.7717/peerj-cs.1053. eCollection 2022.

Authors

Ammar Amjad¹, Lal Khan¹, Hsien-Tsung Chang^{1

2

3

4}

Affiliations

¹ Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.
² Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyaun, Taiwan.
³ Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
⁴ Artificial Intelligence Research Center, Chang Gung University, Taoyuan, Taiwan.

PMID: 36091976
PMCID: PMC9454772
DOI: 10.7717/peerj-cs.1053

Abstract

Speech emotion recognition (SER) systems have evolved into an important method for recognizing a person in several applications, including e-commerce, everyday interactions, law enforcement, and forensics. The SER system's efficiency depends on the length of the audio samples used for testing and training. However, the different suggested models successfully obtained relatively high accuracy in this study. Moreover, the degree of SER efficiency is not yet optimum due to the limited database, resulting in overfitting and skewing samples. Therefore, the proposed approach presents a data augmentation method that shifts the pitch, uses multiple window sizes, stretches the time, and adds white noise to the original audio. In addition, a deep model is further evaluated to generate a new paradigm for SER. The data augmentation approach increased the limited amount of data from the Pakistani racial speaker speech dataset in the proposed system. The seven-layer framework was employed to provide the most optimal performance in terms of accuracy compared to other multilayer approaches. The seven-layer method is used in existing works to achieve a very high level of accuracy. The suggested system achieved 97.32% accuracy with a 0.032% loss in the 75%:25% splitting ratio. In addition, more than 500 augmentation data samples were added. Therefore, the proposed approach results show that deep neural networks with data augmentation can enhance the SER performance on the Pakistani racial speech dataset.

Keywords: Data augmentation; Deep neural network; Multiple window size; Speaker recognition.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1. Structure of a deep neural network.**

**Figure 2. Structure of proposed approach.**

**Figure 3. Block diagram of the computation steps of MFCC.**

**Figure 4. Structure of proposed approach.**

**Figure 5. Proposed model performance on training dataset.**

**Figure 6. Proposed model performance on testing dataset.**

See this image and copyright information in PMC

References

1. Afrillia Y, Mawengkang H, Ramli M, Fadlisyah, Fhonna RP. Performance measurement of mel frequency ceptral coefficient (MFCC) method in learning system Of Al- Qur’an based in Nagham pattern recognition. Journal of Physics: Conference Series. 2017;930:12036. doi: 10.1088/1742-6596/930/1/012036. - DOI
1. Aguiar RL, Costa YM, Silla CN. Exploring data augmentation to improve music genre classification with convnets. 2018 International Joint Conference on Neural Networks (IJCNN); 2018. pp. 1–8.
1. Amjad A, Khan L, Ashraf N, Mahmood MB, Chang H-T. Recognizing semi-natural and spontaneous speech emotions using deep neural networks. IEEE Access. 2022;10:37149–37163. doi: 10.1109/ACCESS.2022.3163712. - DOI
1. Amjad A, Khan L, Chang H-T. Effect on speech emotion classification of a feature selection approach using a convolutional neural network. PeerJ Computer Science. 2021a;7(10):e766. doi: 10.7717/peerj-cs.766. - DOI - PMC - PubMed
1. Amjad A, Khan L, Chang H-T. Semi-natural and spontaneous speech recognition using deep neural networks with hybrid features unification. Processes. 2021b;9(12):2286. doi: 10.3390/pr9122286. - DOI

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Affiliations

Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources