Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug;15(4):046031.
doi: 10.1088/1741-2552/aac965. Epub 2018 Jun 1.

Development of sEMG sensors and algorithms for silent speech recognition

Affiliations

Development of sEMG sensors and algorithms for silent speech recognition

Geoffrey S Meltzner et al. J Neural Eng. 2018 Aug.

Abstract

Objective: Speech is among the most natural forms of human communication, thereby offering an attractive modality for human-machine interaction through automatic speech recognition (ASR). However, the limitations of ASR-including degradation in the presence of ambient noise, limited privacy and poor accessibility for those with significant speech disorders-have motivated the need for alternative non-acoustic modalities of subvocal or silent speech recognition (SSR).

Approach: We have developed a new system of face- and neck-worn sensors and signal processing algorithms that are capable of recognizing silently mouthed words and phrases entirely from the surface electromyographic (sEMG) signals recorded from muscles of the face and neck that are involved in the production of speech. The algorithms were strategically developed by evolving speech recognition models: first for recognizing isolated words by extracting speech-related features from sEMG signals, then for recognizing sequences of words from patterns of sEMG signals using grammar models, and finally for recognizing a vocabulary of previously untrained words using phoneme-based models. The final recognition algorithms were integrated with specially designed multi-point, miniaturized sensors that can be arranged in flexible geometries to record high-fidelity sEMG signal measurements from small articulator muscles of the face and neck.

Main results: We tested the system of sensors and algorithms during a series of subvocal speech experiments involving more than 1200 phrases generated from a 2200-word vocabulary and achieved an 8.9%-word error rate (91.1% recognition rate), far surpassing previous attempts in the field.

Significance: These results demonstrate the viability of our system as an alternative modality of communication for a multitude of applications including: persons with speech impairments following a laryngectomy; military personnel requiring hands-free covert communication; or the consumer in need of privacy while speaking on a mobile phone in public.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An example of the sEMG-based speech activity detection operating on Channel 8 produced by Subject 4 saying the word ‘right’. The figure shows (A) the raw sEMG signal, (B) the sEMG RMS, (C) the filtered sEMG RMS using a 40 ms window with 20 ms overlap, and (D) the absolute value of the derivative of the filtered sEMG RMS, all marked in black. The gray step function in each subplot marks the region at which the algorithm detected speech activity.
Figure 2.
Figure 2.
Recognition of isolated words using different sEMG-based features. Each ‘x’ represents the average ± standard deviation of the word error rate (WER) obtained using different combinations of the specified feature with the other features.
Figure 3.
Figure 3.
Word error rates (WER) from a relatively small-vocabulary of word sequences in Corpus 2 plotted for different grammar models incorporated into the word-based recognition models.
Figure 4.
Figure 4.
Word error rates (WER) from the relatively small-vocabulary of word sequences in Corpus 2 plotted as a function of the number of Gaussian mixtures per hidden Markov model (HMM) state within the phoneme recognition models.
Figure 5.
Figure 5.
Word error rates (WER) from the relatively small-vocabulary of word sequences in Corpus 2 plotted as a function of the dimensions of the sEMG feature set used in the phoneme recognition models after HLDA feature reduction.
Figure 6.
Figure 6.
(A) The word error rate (WER) as a function of the best combination for each sensor subset. (B) Depiction of the final 8-sensor system placed on a subject. (C) Rendering of the prototype sEMG facial sensor array Trigno Quattro (Delsys, Inc).

References

    1. Betts B and Jorgensen C 2005. Small vocabulary recognition using surface electromyography in an acoustically harsh environment NASA TM-2005–21347 pp 1–16 (https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20050242013.pdf)
    1. Chan AD, Englehart K, Hudgkins B and Lovely DF 2001. Myoelectric signals to augment speech recognition Med. Biol. Eng. Comput 39 500–4 - PubMed
    1. EnglishSpeak 2017. Most common 1000 English phrases (www. englishspeak.com/en/english-phrases)
    1. Farooq O and Datta S 2001. Mel filter-like admissible wavelet packet structure for speech recognition IEEE Signal Process. Lett. 8 196–8
    1. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL and Zue V 1993. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 Web Download (Philadelphia, PA: Linguistic Data Consortium; )