Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul;52(7-8):613-625.
doi: 10.1016/j.specom.2010.02.010.

Class-Level Spectral Features for Emotion Recognition

Affiliations

Class-Level Spectral Features for Emotion Recognition

Dmitri Bitouk et al. Speech Commun. 2010 Jul.

Abstract

The most common approaches to automatic emotion recognition rely on utterance level prosodic features. Recent studies have shown that utterance level statistics of segmental spectral features also contain rich information about expressivity and emotion. In our work we introduce a more fine-grained yet robust set of spectral features: statistics of Mel-Frequency Cepstral Coefficients computed over three phoneme type classes of interest-stressed vowels, unstressed vowels and consonants in the utterance. We investigate performance of our features in the task of speaker-independent emotion recognition using two publicly available datasets. Our experimental results clearly indicate that indeed both the richer set of spectral features and the differentiation between phoneme type classes are beneficial for the task. Classification accuracies are consistently higher for our features compared to prosodic or utterance-level spectral features. Combination of our phoneme class features with prosodic features leads to even further improvement. Given the large number of class-level spectral features, we expected feature selection will improve results even further, but none of several selection methods led to clear gains. Further analyses reveal that spectral features computed from consonant regions of the utterance contain more information about emotion than either stressed or unstressed vowel features. We also explore how emotion recognition accuracy depends on utterance length. We show that, while there is no significant dependence for utterance-level prosodic features, accuracy of emotion recognition using class-level spectral features increases with the utterance length.

PubMed Disclaimer

Figures

Figure 1
Figure 1
We computed four different types of features by varying the type of features (prosodic or spectral) and the region of utterances where they are computed (utterance-level and class-level).
Figure 2
Figure 2
Dependence of emotion recognition accuracy on utterance length.

References

    1. Banse R, Scherer K. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology. 1996;70(3):614–636. - PubMed
    1. Bitouk D, Nenkova A, Verma R. Improving emotion recognition using class-level spectral features; Proc Interspeech; 2009.2009. - PMC - PubMed
    1. Boersma P, Weenink D. Praat, a system for doing phonetics by computer. Glot International; 2001. pp. 341–345.
    1. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B. A Database of German Emotional Speech; Proc Interspeech; 2005; 2005. pp. 1–4.
    1. Chang CC, Lin CJ. LIBSVM: a library for support vector machines. 2001 Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.

LinkOut - more resources