Automatic Speech Recognition from Neural Signals: A Focused Review

Christian Herff¹, Tanja Schultz¹

Affiliations

PMID: 27729844
PMCID: PMC5037201
DOI: 10.3389/fnins.2016.00429

Review

Automatic Speech Recognition from Neural Signals: A Focused Review

Christian Herff et al. Front Neurosci. 2016.

. 2016 Sep 27:10:429.

doi: 10.3389/fnins.2016.00429. eCollection 2016.

Authors

Christian Herff¹, Tanja Schultz¹

Affiliation

¹ Cognitive Systems Lab, Department for Mathematics and Computer Science, University of Bremen Bremen, Germany.

PMID: 27729844
PMCID: PMC5037201
DOI: 10.3389/fnins.2016.00429

Abstract

Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.

Keywords: ASR; BCI; ECoG; EEG; automatic speech recognition; brain-computer interface; fNIRS; speech.

PubMed Disclaimer

Figures

**Figure 1**
**ECoG and audio data are recorded at the same time**. Speech decoding software is then used to determine timing of vowels and consonants in acoustic data. ECoG models are then trained for each phone individually by calculating the mean and covariance of all segments associated with that particular phone.

**Figure 2**
**Decoding process in the **Brain-to-text** system**. Broadband gamma power is extracted for a phrase of ECoG data. The most likely word sequence is then decoded by combining the knowledge of ECoG phone models, dictionary and language model.

See this image and copyright information in PMC

References

1. Bouchard K., Chang E. (2014). Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography, in Engineering in Medicine and Biology Society, 2014. EMBS 2014. 36th Annual International Conference of the IEEE (Chicago, IL: IEEE; ). 10.1109/embc.2014.6945185 - DOI - PubMed
1. Brumberg J. S., Nieto-Castanon A., Kennedy P. R., Guenther F. H. (2010). Brain–computer interfaces for speech communication. Speech Commun. 52, 367–379. 10.1016/j.specom.2010.01.001 - DOI - PMC - PubMed
1. Brumberg J. S., Wright E. J., Andreasen D. S., Guenther F. H., Kennedy P. R. (2011). Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex. Front. Neurosci. 5:65. 10.3389/fnins.2011.00065 - DOI - PMC - PubMed
1. Chakrabarti S., Sandberg H. M., Brumberg J. S., Krusienski D. J. (2015). Progress in speech decoding from the electrocorticogram. Biomed. Eng. Lett. 5, 10–21. 10.1007/s13534-015-0175-1 - DOI
1. Chang E. F., Rieger J. W., Johnson K., Berger M. S., Barbaro N. M., Knight R. T. (2010). Categorical speech representation in human superior temporal gyrus. Nat. Neurosci. 13, 1428–1432. 10.1038/nn.2641 - DOI - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic Speech Recognition from Neural Signals: A Focused Review

Affiliation

Automatic Speech Recognition from Neural Signals: A Focused Review

Authors

Affiliation

Abstract

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials