Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov:64:101100.
doi: 10.1016/j.csl.2020.101100. Epub 2020 Apr 16.

Vocal tract shaping of emotional speech

Affiliations

Vocal tract shaping of emotional speech

Jangwon Kim et al. Comput Speech Lang. 2020 Nov.

Abstract

Emotional speech production has been previously studied using fleshpoint tracking data in speaker-specific experiment setups. The present study introduces a real-time magnetic resonance imaging database of emotional speech production from 10 speakers and presents articulatory analysis results of speech emotional expression using the database. Midsagittal vocal tract parameters (midsagittal distances and the vocal tract length) were parameterized based on a two-dimensional grid-line system, using image segmentation software. The principal feature analysis technique was applied to the grid-line system in order to find the major movement locations. Results reveal both speaker-dependent and speaker-independent variation patterns. For example, sad speech, a low arousal emotion, tends to show smaller opening for low vowels in the front cavity than the high arousal emotions more consistently than the other regions of the vocal tract. Happiness shows significantly shorter vocal tract length than anger and sadness in most speakers. Further details of speaker-dependent and speaker-independent speech articulation variation in emotional expression and their implications are described.

Keywords: MR image segmentation; USC-EMO-MRI corpus; emotional speech production; vocal tract shaping.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1:
Figure 1:
Results of parameterization processes of a magnetic resonance image (speaker M1) as an example. X-y axes in (a)-(g) indicate the pixel index.
Figure 2:
Figure 2:
Principal features overlaid on an MR image for each speaker. Cyan lines are the principal features. Green numbers are the principal feature indices.
Figure 3:
Figure 3:
Time series of the third principal feature before (a) and after (b) temporal alignment of an utterance to a reference (neutral) utterance of speaker M1.
Figure 4:
Figure 4:
Averaged time series of the first and the seventh principal features for each emotion of speaker M1. The averaged time series were temporally aligned. The utterances of Sentence 6 “nine one five two six nine five one six two” are used.
Figure 5:
Figure 5:
0.1 quantiles of principal features for anger, happiness and sadness relative to neutrality for each speaker.
Figure 6:
Figure 6:
0.9 quantiles of principal features for anger, happiness and sadness relative to neutrality for each speaker.
Figure 7:
Figure 7:
Boxplots of the vocal tract length of each emotion.

References

    1. Banse R, Scherer KR, 1996. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3), 614 – 636. - PubMed
    1. Bresch E, Kim Y-C, Nayak K, Byrd D, Narayanan S, 2008. Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP]. Signal Processing Magazine, IEEE 25(3), 123–132.
    1. Bresch E, Nielsen J, Nayak KS, Narayanan SS, October 2006. Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. The Journal of the Acoustical Society of America 120(4), 1791–1794. - PMC - PubMed
    1. Cai J, Laprie Y, Busset J, Hirsch F, 2009. Articulatory modeling based on semi-polar coordinates and guilded PCA technique In: Proceedings of Interspeech. ISCA, Brighton,UK, pp. 56 – 59.
    1. Erickson D, Fujimura O, Pardo B, 1998. Articulatory correlates of prosodic control: Emotion and emphasis. Language and Speech 41(3–4), 399–417. - PubMed

LinkOut - more resources