Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Kun-Ching Wang¹

Affiliations

Affiliation

¹ Department of Information Technology & Communication, Shih Chien University, 200 University Road, Neimen, Kaohsiung 84550, Taiwan. kunching@mail.kh.usc.edu.tw.

PMID: 25594590
PMCID: PMC4327087
DOI: 10.3390/s150101458

Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Kun-Ching Wang. Sensors (Basel). 2015.

. 2015 Jan 14;15(1):1458-78.

doi: 10.3390/s150101458.

Author

Kun-Ching Wang¹

Affiliation

¹ Department of Information Technology & Communication, Shih Chien University, 200 University Road, Neimen, Kaohsiung 84550, Taiwan. kunching@mail.kh.usc.edu.tw.

PMID: 25594590
PMCID: PMC4327087
DOI: 10.3390/s150101458

Abstract

The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.

PubMed Disclaimer

Figures

**Figure 1.**
The flowchart for deriving the proposed MRTII-based feature extraction approach.

**Figure 2.**
Spectrogram image decomposition: (a) one-level; (b) two-level.

**Figure 3.**
The first prefer channel in 4-level tree-structured wavelet transform domain for three types of emotion: Anger, Fear and Neutral.

**Figure 4.**
Comparison of the recognition recognitions using different feature extractions.

See this image and copyright information in PMC

References

1. Adams B., Breazeal C., Brooks R., Scassellati B. Humanoid robots: A new kind of tool. IEEE Intell. Syst. Their Appl. 2000;15:25–31.
1. Kim E., Hyun K., Kim S., Kwak Y. Emotion interactive robot focus on speaker independently emotion recognition. Proceedings of the 2007 IEEE/ASME international conference on Advanced intelligent mechatronics; Zurich, Switzerland. 4–7 September 2007; pp. 1–6.
1. Cowie R., Douglas-Cowie E., Tsapatsoulis N., Votsis G., Kollias S., Fellenz W., Taylor J.G. Emotion Recognition in Human-Computer Interaction. IEEE Signal Process. Mag. 2001;18:32–80.
1. Petrushin V.A. Emotion recognition in speech signal: Experimental study, development, and application. Proc. ICSLP. 2000;2000:222–225.
1. Schuller B., Rigoll G., Lang M. Hidden Markov Model-based Speech Emotion Recognition. Proceedings of the International Conference on Acoustics, Speech, and Signal; Hong Kong, China. 6–10 April 2003; pp. 401–405.

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Affiliation

Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Author

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous