Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 22:2014:627189.
doi: 10.1155/2014/627189. eCollection 2014.

Voice quality modelling for expressive speech synthesis

Affiliations

Voice quality modelling for expressive speech synthesis

Carlos Monzo et al. ScientificWorldJournal. .

Abstract

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A block diagram for the proposed expressive speech styles transformation methodology.
Figure 2
Figure 2
Quality MOS test results for the configurations of “Natural,” “ResHNM,” “HNMPro,” “HNMProJiSh,” and “HNMProVoQ.”

References

    1. Cowie R, Douglas-Cowie E, Tsapatsoulis N, et al. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine. 2001;18(1):32–80.
    1. Planet S, Iriondo I, Socoró J-C, Monzo C, Adell J. GTM-URL contribution to the INTERSPEECH 2009 Emotion Challenge. Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH '09); September 2009; Brighton, UK. pp. 316–319.
    1. Drioli C, Tisato G, Cosi P, Tesser F. Emotions and voice quality: experiments with sinusoidal modeling. Proceedings of the ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (VOQUAL '03); 2003; Geneva, Switzerland. pp. 127–132.
    1. Turk O, Schröder M, Bozkurt B, Arslan LM. Voice quality interpolation for emotional text-to-speech synthesis. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05); September 2005; Lisbon, Portugal. pp. 797–800.
    1. Erro D. Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models [Ph.D. thesis] Barcelona, Spain: Universitat Politècnica de Catalunya; 2008.

LinkOut - more resources