. 2009 Dec 9;4(12):e8218.

doi: 10.1371/journal.pone.0008218.

A wireless brain-machine interface for real-time speech synthesis

Frank H Guenther¹, Jonathan S Brumberg, E Joseph Wright, Alfonso Nieto-Castanon, Jason A Tourville, Mikhail Panko, Robert Law, Steven A Siebert, Jess L Bartels, Dinal S Andreasen, Princewill Ehirim, Hui Mao, Philip R Kennedy

Affiliations

Affiliation

¹ Department of Cognitive and Neural Systems and Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, Massachusetts, United States of America. guenther@cns.bu.edu

PMID: 20011034
PMCID: PMC2784218
DOI: 10.1371/journal.pone.0008218

A wireless brain-machine interface for real-time speech synthesis

Frank H Guenther et al. PLoS One. 2009.

. 2009 Dec 9;4(12):e8218.

doi: 10.1371/journal.pone.0008218.

Authors

Affiliation

¹ Department of Cognitive and Neural Systems and Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, Massachusetts, United States of America. guenther@cns.bu.edu

PMID: 20011034
PMCID: PMC2784218
DOI: 10.1371/journal.pone.0008218

Abstract

Background: Brain-machine interfaces (BMIs) involving electrodes implanted into the human cerebral cortex have recently been developed in an attempt to restore function to profoundly paralyzed individuals. Current BMIs for restoring communication can provide important capabilities via a typing process, but unfortunately they are only capable of slow communication rates. In the current study we use a novel approach to speech restoration in which we decode continuous auditory parameters for a real-time speech synthesizer from neuronal activity in motor cortex during attempted speech.

Methodology/principal findings: Neural signals recorded by a Neurotrophic Electrode implanted in a speech-related region of the left precentral gyrus of a human volunteer suffering from locked-in syndrome, characterized by near-total paralysis with spared cognition, were transmitted wirelessly across the scalp and used to drive a speech synthesizer. A Kalman filter-based decoder translated the neural signals generated during attempted speech into continuous parameters for controlling a synthesizer that provided immediate (within 50 ms) auditory feedback of the decoded sound. Accuracy of the volunteer's vowel productions with the synthesizer improved quickly with practice, with a 25% improvement in average hit rate (from 45% to 70%) and 46% decrease in average endpoint error from the first to the last block of a three-vowel task.

Conclusions/significance: Our results support the feasibility of neural prostheses that may have the potential to provide near-conversational synthetic speech output for individuals with severely impaired speech motor control. They also provide an initial glimpse into the functional properties of neurons in speech motor cortical areas.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Authors P.R.K. and D.S.A. declare competing financial interests due to ownership interest in Neural Signals Inc. This does not alter the authors' adherence to PLoS ONE policies on sharing data and materials.

Figures

**Figure 1. Schematic of the brain-machine interface for real-time synthetic speech production.**
Black circles and curved arrows represent neurons and axonal projections, respectively, in the neural circuitry for speech motor output. The volunteer's stroke-induced lesion in the efferent motor pathways (red X) disconnects motor plans represented in the cerebral cortex from the speech motoneurons, thus disabling speech output while sparing somatic, auditory, and visual sensation as well as speech motor planning centers in cerebral cortex. Signals collected from an electrode implanted in the subject's speech motor cortex are amplified and sent wirelessly across the scalp as FM radio signals. The signals are then routed to an electrophysiology recording system for further amplification, analog-to-digital conversion, and spike sorting. The sorted spikes are sent to a Neural Decoder which translates them into commands for a Speech Synthesizer. Audio signals from the synthesizer are fed back to the subject in real time. [Abbreviation: PrCG = precentral gyrus.]

**Figure 2. Formant frequencies in speech.**
(A) Spectrogram of the utterance “good doggy” with the trajectories of the first three formant frequencies (F1, F2, F3) clearly visible as bright bands of high energy. (B) Approximate locations of the monophthongal vowels of English plotted on the plane formed by the first two formant frequencies.

**Figure 3. Formant tuning of individual units and of the neural ensemble.**
**Individual units**: Black arrows represent the formant tuning of individual units in polar coordinates, with angle representing the preferred direction of movement and arrow length representing the tuning strength (see Methods for details). The strength of tuning to each possible direction of movement in formant space is computed as the average correlation (across sessions) between each unit's firing rate and the target formant position along this direction. The preferred direction of movement is then computed as the direction with maximal tuning strength among all possible directions. **Neural ensemble**: The black curve represents the formant tuning of the neural ensemble in polar coordinates, with angles representing each possible direction of movement and distance from origin representing the tuning strength. Green lines represent the directions of movement in formant space of the three target sounds used for training, and the two small circles along the neural ensemble tuning curve represent the average strength of the correlations between firing rates and F1 (r = 0.49, p<.001), and F2 (r = .57, p<.001), respectively.

**Figure 4. Distribution of unit correlation coefficients (indicative of formant tuning strengths in the unit's preferred direction) averaged across sessions.**
The first peak is representative of units with “poor” tuning, the second peak represents units with “average” tuning, and the right distribution tail represents units with “good” tuning.

**Figure 5. Sample tuning curves for representative good (top), average (middle), and poor (bottom) tuned units.**
Tuning curves (black) and 95% confidence intervals (gray) are computed as described in *Methods* *: Formant tuning analyses*. Tuning strength is indicated by the correlation between unit firing rates and formant frequency. The three units shown are primarily tuned to changes in F2, illustrated by relatively low values at each horizontal direction.

**Figure 6. Results of offline ridge regression reconstruction of intended formant frequencies while the participant attempted to speak in synchrony with a speech stimulus.**
Reconstructed values of the first (bottom) and second (middle) formant frequencies (in Mel units) are shown (black lines) along with the formant frequencies present in the stimulus being mimicked (gray lines).

**Figure 7. Results for vowel production task with real-time synthesizer.**
(A–C) Performance measures as a function of block number within a session, averaged across all sessions: (A) hit rate, (B) movement time, and (C) endpoint error. (D) Average endpoint error as a function of session number. (E–F) Average formant trajectories for utterances in (E) the last block of all sessions, and (F) successful trials in all blocks and sessions.

**Figure 8. Electrode location in the participant's cerebral cortex.**
(A) Left panels: Axial (top) and sagittal (bottom) slices showing brain activity along the precentral gyrus during a word generation fMRI task prior to implantation. Red lines denote pre-central sulcus; yellow lines denote central sulcus. Right panels: Corresponding images from a post-implant CT scan showing location of electrode. (B) 3D CT image showing electrode wire entering dura mater. Subcutaneous electronics are visible above the electrode wire, on top of the skull.

See this image and copyright information in PMC

References

1. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughn TM. Brain-computer interfaces for communication and control. Clin Neurophysiol. 2002;113:767–791. - PubMed
1. Birbaumer N, Ghanayim N, Hinterberger T, Iversen I, Kotchoubey B, et al. A spelling device for the paralysed. Nature. 1999;398:297–298. - PubMed
1. Birbaumer N, Kubler A, Ghanayim N, Hinterberger T, Perelmouter J, et al. The thought translation device (TTD) for completely paralyzed patients. IEEE Trans Rehabil Eng. 2000;8:190–193. - PubMed
1. Birbaumer N, Hinterberger T, Kübler A, Neumann N. The thought-translation device (TTD): neurobehavioral mechanisms and clinical outcome. IEEE Trans Neural Syst Rehabil Eng. 2003;11:120–123. - PubMed
1. Hinterberger T, Kübler A, Kaiser J, Neumann N, Birbaumer N. A brain-computer interface (BCI) for the locked-in: comparison of different EEG classifications for the thought translation device. Clin Neurophysiol. 2003;114:416–425. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R44 DC007050/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A wireless brain-machine interface for real-time speech synthesis

Affiliation

A wireless brain-machine interface for real-time speech synthesis

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources