Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 14;15(1):1458-78.
doi: 10.3390/s150101458.

Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Affiliations

Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Kun-Ching Wang. Sensors (Basel). .

Abstract

The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The flowchart for deriving the proposed MRTII-based feature extraction approach.
Figure 2.
Figure 2.
Spectrogram image decomposition: (a) one-level; (b) two-level.
Figure 3.
Figure 3.
The first prefer channel in 4-level tree-structured wavelet transform domain for three types of emotion: Anger, Fear and Neutral.
Figure 4.
Figure 4.
Comparison of the recognition recognitions using different feature extractions.

References

    1. Adams B., Breazeal C., Brooks R., Scassellati B. Humanoid robots: A new kind of tool. IEEE Intell. Syst. Their Appl. 2000;15:25–31.
    1. Kim E., Hyun K., Kim S., Kwak Y. Emotion interactive robot focus on speaker independently emotion recognition. Proceedings of the 2007 IEEE/ASME international conference on Advanced intelligent mechatronics; Zurich, Switzerland. 4–7 September 2007; pp. 1–6.
    1. Cowie R., Douglas-Cowie E., Tsapatsoulis N., Votsis G., Kollias S., Fellenz W., Taylor J.G. Emotion Recognition in Human-Computer Interaction. IEEE Signal Process. Mag. 2001;18:32–80.
    1. Petrushin V.A. Emotion recognition in speech signal: Experimental study, development, and application. Proc. ICSLP. 2000;2000:222–225.
    1. Schuller B., Rigoll G., Lang M. Hidden Markov Model-based Speech Emotion Recognition. Proceedings of the International Conference on Acoustics, Speech, and Signal; Hong Kong, China. 6–10 April 2003; pp. 401–405.

LinkOut - more resources