Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition
- PMID: 25594590
- PMCID: PMC4327087
- DOI: 10.3390/s150101458
Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition
Abstract
The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.
Figures




References
-
- Adams B., Breazeal C., Brooks R., Scassellati B. Humanoid robots: A new kind of tool. IEEE Intell. Syst. Their Appl. 2000;15:25–31.
-
- Kim E., Hyun K., Kim S., Kwak Y. Emotion interactive robot focus on speaker independently emotion recognition. Proceedings of the 2007 IEEE/ASME international conference on Advanced intelligent mechatronics; Zurich, Switzerland. 4–7 September 2007; pp. 1–6.
-
- Cowie R., Douglas-Cowie E., Tsapatsoulis N., Votsis G., Kollias S., Fellenz W., Taylor J.G. Emotion Recognition in Human-Computer Interaction. IEEE Signal Process. Mag. 2001;18:32–80.
-
- Petrushin V.A. Emotion recognition in speech signal: Experimental study, development, and application. Proc. ICSLP. 2000;2000:222–225.
-
- Schuller B., Rigoll G., Lang M. Hidden Markov Model-based Speech Emotion Recognition. Proceedings of the International Conference on Acoustics, Speech, and Signal; Hong Kong, China. 6–10 April 2003; pp. 401–405.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous