Cross-Speaker Training and Adaptation for Electromyography-to-Speech Conversion

Kevin Scheck, Zhao Ren, Tom Dombeck, Jenny Sonnert, Stefano van Gogh, Qinhan Hou, Michael Wand, Tanja Schultz

PMID: 40039250
DOI: 10.1109/EMBC53108.2024.10781707

Cross-Speaker Training and Adaptation for Electromyography-to-Speech Conversion

Kevin Scheck et al. Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul.

. 2024 Jul:2024:1-4.

doi: 10.1109/EMBC53108.2024.10781707.

Authors

Kevin Scheck, Zhao Ren, Tom Dombeck, Jenny Sonnert, Stefano van Gogh, Qinhan Hou, Michael Wand, Tanja Schultz

PMID: 40039250
DOI: 10.1109/EMBC53108.2024.10781707

Abstract

Surface Electromyography (EMG) signals of articulatory muscles can be used to synthesize acoustic speech with Electromyography-to-Speech (ETS) models. Recent models have improved the synthesis quality by combining training data from multiple recordings of single speakers. In this work, we evaluated whether using recordings of multiple speakers also increases performance and if cross-speaker models can be adapted to unseen speakers with limited data. We recorded the EMG-Vox corpus, which consists of EMG and audio signals of four speakers with five sessions each. We compared cross-speaker models with single-speaker counterparts and conducted adaptation experiments. Cross-speaker models achieved on average significantly better performance than single-speaker models. Experiments with balanced data indicated that this improvement stemmed from a larger training set. Performing speaker adaptation from cross-speaker models showed higher synthesis quality than training from scratch and was at least on par with session adaptation for most speakers. To the best of our knowledge, this is the first work to report that cross-speaker ETS models yielded better results than single-speaker models.

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cross-Speaker Training and Adaptation for Electromyography-to-Speech Conversion

Cross-Speaker Training and Adaptation for Electromyography-to-Speech Conversion

Authors

Abstract

MeSH terms