Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 23;4(1):1055.
doi: 10.1038/s42003-021-02578-0.

Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity

Affiliations

Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity

Miguel Angrick et al. Commun Biol. .

Abstract

Speech neuroprosthetics aim to provide a natural communication channel to individuals who are unable to speak due to physical or neurological impairments. Real-time synthesis of acoustic speech directly from measured neural activity could enable natural conversations and notably improve quality of life, particularly for individuals who have severely limited means of communication. Recent advances in decoding approaches have led to high quality reconstructions of acoustic speech from invasively measured neural activity. However, most prior research utilizes data collected during open-loop experiments of articulated speech, which might not directly translate to imagined speech processes. Here, we present an approach that synthesizes audible speech in real-time for both imagined and whispered speech conditions. Using a participant implanted with stereotactic depth electrodes, we were able to reliably generate audible speech in real-time. The decoding models rely predominately on frontal activity suggesting that speech processes have similar representations when vocalized, whispered, or imagined. While reconstructed audio is not yet intelligible, our real-time synthesis approach represents an essential step towards investigating how patients will learn to operate a closed-loop speech neuroprosthesis based on imagined speech.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of experimental design.
The experiment begins with an open-loop run in which the participant reads a series of 100 words aloud while the speech and brain activity are synchronously recorded. In the two subsequent closed-loop runs, the participant performs the same task while whispering and imagining speech, respectively. For the closed-loop runs, real-time audible feedback of the neurally-decoded and synthesized speech is provided via our system.
Fig. 2
Fig. 2. Schematic overview of our proposed real-time synthesis approach.
a Invasive brain signals are acquired through implanted sEEG electrodes. b Multichannel signals are processed to extract the high-gamma power. c Linear decoding models are used to estimate a spectral representation (d) which is synthesized into an audible speech waveform using the Griffin–Lim algorithm and presented to the patient as real-time auditory feedback.
Fig. 3
Fig. 3. Decoding performance of the proposed method on the open-loop data from the audible speech experiment.
The spectrogram was reconstructed using ten-fold cross-validation. a Visual comparison of original and reconstructed spectrograms. b Correlation coefficients across all spectral bins for our approach (blue, n1 = 10) compared to a randomized baseline (red, n2 = 100) generated by breaking the temporal alignment between the brain signals and speech recordings. Shaded areas represent standard deviation.
Fig. 4
Fig. 4. Decoding results of the proposed method in the closed-loop experimental runs.
a Selected examples of synthesized audio waveforms produced during whispered speech trials. b Selected examples of synthesized audio waveforms produced during imagined speech trials. In both runs, the speech was reliably produced when the participant was prompted to whisper or imagine to speak, respectively. c Pearson correlation coefficients between time-warped reference speech trials and closed-loop whispered trials (n1 = 73) and closed-loop imagined speech trials (n1 = 75), respectively. Chance level (n2 = 1000) is based on randomly selected data from non-speech tasks performed by the participant. Statistical significance, indicated by asterisks (***P < 0.001; **P < 0.01), was computed using Mann–Whitney U tests. Black horizontal lines correspond to median DTW correlations scores. Boxes define boundaries between the first and the third quartile. Error bars present the range of data within 1.5 times the interquartile range, and points beyond the range of the error bars show outliers. d The proportion of decoded and synthesized speech during whispered and imagined trials, respectively, versus non-speech intertrial intervals.
Fig. 5
Fig. 5. Anatomical and temporal contributions.
a Spatiotemporal decoder activations averaged across 9 classes and 40 frequency bins. The colored triangles at the top of the panel correspond to the colored electrode shafts in (bd). The left edge of each triangle represents the deepest contact of a respective colored electrode shaft and the most superficial contact the same shaft is represented at the right edge, with the intermediate contacts ordered longitudinally in between. The activations (i.e., transformed average model weights) for the decoding models at each electrode and temporal lag are indicated by the vertical bars below the corresponding colored triangle. Darker red indicates higher absolute activations. The activations indicate that inferior frontal and middle frontal cortices are predominately employed in decoding. bd Different views of electrode locations for the participant: b left lateral, c frontal, d superior.
Fig. 6
Fig. 6. Schematic overview of the proposed closed-loop decoder.
Each node corresponds to one self-contained task, which is connected in an acyclic network. Rectangular nodes specify actual computations in the generation of the acoustic waveform, while circular nodes represent output nodes writing incoming data on disc for offline evaluation. Double-lined nodes indicate that its (and subsequent) calculations are performed asynchronously in different processes. The extraction of neural activity is composed of multiple nodes.

References

    1. Huggins JE, et al. Workshops of the sixth international brain–computer interface meeting: brain–computer interfaces past, present, and future. Brain-Computer Interfaces. 2017;4:3–36. doi: 10.1080/2326263X.2016.1275488. - DOI - PMC - PubMed
    1. Hochberg LR, et al. Reach and grasp by people with tetraplegia using a neurally controlled robotic arm. Nature. 2012;485:372–375. doi: 10.1038/nature11076. - DOI - PMC - PubMed
    1. Pandarinath C, et al. High performance communication by people with paralysis using an intracortical brain-computer interface. Elife. 2017;6:e18554. doi: 10.7554/eLife.18554. - DOI - PMC - PubMed
    1. Vansteensel MJ, et al. Fully implanted brain–computer interface in a locked-in patient with als. N. Engl. J. Med. 2016;375:2060–2066. doi: 10.1056/NEJMoa1608085. - DOI - PMC - PubMed
    1. Pels EG, et al. Stability of a chronic implanted brain-computer interface in late-stage amyotrophic lateral sclerosis. Clin. Neurophysiol. 2019;130:1798–1803. doi: 10.1016/j.clinph.2019.07.020. - DOI - PMC - PubMed

Publication types