Voice quality modelling for expressive speech synthesis

Carlos Monzo¹, Ignasi Iriondo², Joan Claudi Socoró²

Affiliations

¹ Computer Science, Multimedia and Telecommunication Studies, Universitat Oberta de Catalunya (UOC), Rambla del Poblenou 156, 08018 Barcelona, Spain.
² Grup de Recerca en Tecnologies Mèdia (GTM), Universitat Ramon Llull, La Salle, Quatre Camins 2, 08022 Barcelona, Spain.

PMID: 24587738
PMCID: PMC3920859
DOI: 10.1155/2014/627189

Voice quality modelling for expressive speech synthesis

Carlos Monzo et al. ScientificWorldJournal. 2014.

. 2014 Jan 22:2014:627189.

doi: 10.1155/2014/627189. eCollection 2014.

Authors

Carlos Monzo¹, Ignasi Iriondo², Joan Claudi Socoró²

Affiliations

¹ Computer Science, Multimedia and Telecommunication Studies, Universitat Oberta de Catalunya (UOC), Rambla del Poblenou 156, 08018 Barcelona, Spain.
² Grup de Recerca en Tecnologies Mèdia (GTM), Universitat Ramon Llull, La Salle, Quatre Camins 2, 08022 Barcelona, Spain.

PMID: 24587738
PMCID: PMC3920859
DOI: 10.1155/2014/627189

Abstract

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics.

PubMed Disclaimer

Figures

**Figure 1**
A block diagram for the proposed expressive speech styles transformation methodology.

**Figure 2**
Quality MOS test results for the configurations of “Natural,” “ResHNM,” “HNMPro,” “HNMProJiSh,” and “HNMProVoQ.”

See this image and copyright information in PMC

References

1. Cowie R, Douglas-Cowie E, Tsapatsoulis N, et al. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine. 2001;18(1):32–80.
1. Planet S, Iriondo I, Socoró J-C, Monzo C, Adell J. GTM-URL contribution to the INTERSPEECH 2009 Emotion Challenge. Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH '09); September 2009; Brighton, UK. pp. 316–319.
1. Drioli C, Tisato G, Cosi P, Tesser F. Emotions and voice quality: experiments with sinusoidal modeling. Proceedings of the ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (VOQUAL '03); 2003; Geneva, Switzerland. pp. 127–132.
1. Turk O, Schröder M, Bozkurt B, Arslan LM. Voice quality interpolation for emotional text-to-speech synthesis. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05); September 2005; Lisbon, Portugal. pp. 797–800.
1. Erro D. Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models [Ph.D. thesis] Barcelona, Spain: Universitat Politècnica de Catalunya; 2008.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Voice quality modelling for expressive speech synthesis

Affiliations

Voice quality modelling for expressive speech synthesis

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources