Development of sEMG sensors and algorithms for silent speech recognition

Geoffrey S Meltzner¹, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, Joshua C Kline

Affiliations

PMID: 29855428
PMCID: PMC6168082
DOI: 10.1088/1741-2552/aac965

Development of sEMG sensors and algorithms for silent speech recognition

Geoffrey S Meltzner et al. J Neural Eng. 2018 Aug.

. 2018 Aug;15(4):046031.

doi: 10.1088/1741-2552/aac965. Epub 2018 Jun 1.

Authors

Geoffrey S Meltzner¹, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, Joshua C Kline

Affiliation

¹ VocaliD, Inc. 50 Leonard St, Belmont, MA 02478, United States of America.

PMID: 29855428
PMCID: PMC6168082
DOI: 10.1088/1741-2552/aac965

Abstract

Objective: Speech is among the most natural forms of human communication, thereby offering an attractive modality for human-machine interaction through automatic speech recognition (ASR). However, the limitations of ASR-including degradation in the presence of ambient noise, limited privacy and poor accessibility for those with significant speech disorders-have motivated the need for alternative non-acoustic modalities of subvocal or silent speech recognition (SSR).

Approach: We have developed a new system of face- and neck-worn sensors and signal processing algorithms that are capable of recognizing silently mouthed words and phrases entirely from the surface electromyographic (sEMG) signals recorded from muscles of the face and neck that are involved in the production of speech. The algorithms were strategically developed by evolving speech recognition models: first for recognizing isolated words by extracting speech-related features from sEMG signals, then for recognizing sequences of words from patterns of sEMG signals using grammar models, and finally for recognizing a vocabulary of previously untrained words using phoneme-based models. The final recognition algorithms were integrated with specially designed multi-point, miniaturized sensors that can be arranged in flexible geometries to record high-fidelity sEMG signal measurements from small articulator muscles of the face and neck.

Main results: We tested the system of sensors and algorithms during a series of subvocal speech experiments involving more than 1200 phrases generated from a 2200-word vocabulary and achieved an 8.9%-word error rate (91.1% recognition rate), far surpassing previous attempts in the field.

Significance: These results demonstrate the viability of our system as an alternative modality of communication for a multitude of applications including: persons with speech impairments following a laryngectomy; military personnel requiring hands-free covert communication; or the consumer in need of privacy while speaking on a mobile phone in public.

PubMed Disclaimer

Figures

**Figure 1.**
An example of the sEMG-based speech activity detection operating on Channel 8 produced by Subject 4 saying the word ‘right’. The figure shows (A) the raw sEMG signal, (B) the sEMG RMS, (C) the filtered sEMG RMS using a 40 ms window with 20 ms overlap, and (D) the absolute value of the derivative of the filtered sEMG RMS, all marked in black. The gray step function in each subplot marks the region at which the algorithm detected speech activity.

**Figure 2.**
Recognition of isolated words using different sEMG-based features. Each ‘x’ represents the average ± standard deviation of the word error rate (WER) obtained using different combinations of the specified feature with the other features.

**Figure 3.**
Word error rates (WER) from a relatively small-vocabulary of word sequences in Corpus 2 plotted for different grammar models incorporated into the word-based recognition models.

**Figure 4.**
Word error rates (WER) from the relatively small-vocabulary of word sequences in Corpus 2 plotted as a function of the number of Gaussian mixtures per hidden Markov model (HMM) state within the phoneme recognition models.

**Figure 5.**
Word error rates (WER) from the relatively small-vocabulary of word sequences in Corpus 2 plotted as a function of the dimensions of the sEMG feature set used in the phoneme recognition models after HLDA feature reduction.

**Figure 6.**
(A) The word error rate (WER) as a function of the best combination for each sensor subset. (B) Depiction of the final 8-sensor system placed on a subject. (C) Rendering of the prototype sEMG facial sensor array Trigno^™ Quattro (Delsys, Inc).

See this image and copyright information in PMC

References

1. Betts B and Jorgensen C 2005. Small vocabulary recognition using surface electromyography in an acoustically harsh environment NASA TM-2005–21347 pp 1–16 (https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20050242013.pdf)
1. Chan AD, Englehart K, Hudgkins B and Lovely DF 2001. Myoelectric signals to augment speech recognition Med. Biol. Eng. Comput 39 500–4 - PubMed
1. EnglishSpeak 2017. Most common 1000 English phrases (www. englishspeak.com/en/english-phrases)
1. Farooq O and Datta S 2001. Mel filter-like admissible wavelet packet structure for speech recognition IEEE Signal Process. Lett. 8 196–8
1. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL and Zue V 1993. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 Web Download (Philadelphia, PA: Linguistic Data Consortium; )

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R44 DC014870/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development of sEMG sensors and algorithms for silent speech recognition

Affiliation

Development of sEMG sensors and algorithms for silent speech recognition

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources