Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 16;15(9):e1007091.
doi: 10.1371/journal.pcbi.1007091. eCollection 2019 Sep.

Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex

Affiliations

Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex

Jesse A Livezey et al. PLoS Comput Biol. .

Abstract

A fundamental challenge in neuroscience is to understand what structure in the world is represented in spatially distributed patterns of neural activity from multiple single-trial measurements. This is often accomplished by learning a simple, linear transformations between neural features and features of the sensory stimuli or motor task. While successful in some early sensory processing areas, linear mappings are unlikely to be ideal tools for elucidating nonlinear, hierarchical representations of higher-order brain areas during complex tasks, such as the production of speech by humans. Here, we apply deep networks to predict produced speech syllables from a dataset of high gamma cortical surface electric potentials recorded from human sensorimotor cortex. We find that deep networks had higher decoding prediction accuracy compared to baseline models. Having established that deep networks extract more task relevant information from neural data sets relative to linear models (i.e., higher predictive accuracy), we next sought to demonstrate their utility as a data analysis tool for neuroscience. We first show that deep network's confusions revealed hierarchical latent structure in the neural data, which recapitulated the underlying articulatory nature of speech motor control. We next broadened the frequency features beyond high-gamma and identified a novel high-gamma-to-beta coupling during speech production. Finally, we used deep networks to compare task-relevant information in different neural frequency bands, and found that the high-gamma band contains the vast majority of information relevant for the speech prediction task, with little-to-no additional contribution from lower-frequency amplitudes. Together, these results demonstrate the utility of deep networks as a data analysis tool for basic and applied neuroscience.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Human ECoG recordings from ventral sensorimotor cortex (vSMC) during speech production.
A Electrodes overlaid on vSMC. Electrodes are colored red-to-black with increasing distance from the Sylvian Fissure. B-D Task and data summary for three different consonant-vowel (CV) utterances. B Vocal tract configuration and point of constriction (orange dot) during the consonant for the production of /ba/ (lips), /da/ (coronal tongue), and /ga/ (dorsal tongue). C) The audio spectrogram aligned to the consonant-to-vowel acoustic transition (dashed line). D Mean across trials of the Hγ amplitude from a subset of electrodes in vSMC aligned to CV transition. Traces are colored red-to-black with increasing distance from the Sylvian Fissure as in A. The syllables /ba/, /da/, and /ga/ are generated by overlapping yet distinct spatio-temporal patterns of activity across vSMC. E Logistic regression accuracy for consonants and vowels plotted against time aligned to the CV transition averaged across subjects and folds. Black and grey traces are average (± s.e.m., n = 40) accuracies for consonants (18–19 classes) and vowels (3 classes) respectively.
Fig 2
Fig 2. Data processing and deep network training pipeline for ECoG data.
A Cortical surface electrical potentials plotted against time for a subset of the vSMC electrodes segmented to the CV production window. Electrodes have an arbitrary vertical offset for visualization. B Voltage for one electrode. C The z-scored analytic amplitude is shown for a subset of the 40 frequency ranges used in the Hilbert Transform as a function of time. D The 40 ranges used in the Hilbert Transform are grouped and averaged according to whether their center frequency is part of each traditional neuroscience band. E For a particular analysis, a subset of the bands are chosen as features, and this process was repeated for each trial (sub-pane) and electrode (trace within each sub-pane) in vSMC. Each data sample consists of one trial’s Hγ activity for all electrodes in vSMC. F Data were partitioned 10 times into training, validation, and testing subsets (80%, 10%, and 10% respectively) with independent testing subsets. We trained models that varied in a large hyper-parameter space, including network architecture and optimization parameters, symbolized by the 3 networks on the left with differing numbers of units and layers. The optimal model (right) is chosen based on the validation accuracy and results are reported on the test set.
Fig 3
Fig 3. Classification accuracy of logistic regression versus deep networks for different classification tasks.
For A-E, accuracies (± s.e.m., n = 10) are normalized to chance (chance = 1, dashed blue line) independently for each subject and task. Points on the left are multinomial logistic regression accuracy and are connected to the points on the right which are deep network accuracies for each subject. Subject accuracies have been left-right jittered to prevent visual overlap and demarcated with color (legend in E). A-D Classification accuracy when CV predictions are restricted to consonant constriction location (A), consonant constriction degree (B), vowel (C), or consonant (D) classification tasks. E Classification of entire consonant-vowel syllables from Hγ amplitude features. *p < 0.05, WSRT, Bonferroni corrected with n = 4. n.s., not significant. Significance was tested between deep network and logistic regression accuracies.
Fig 4
Fig 4. Deep network predictions reveal a latent articulatory hierarchy from single-trial ECoG recordings.
A The dendrogram from a hierarchical clustering of deep network predictions on the test set averaged across all subjects. The threshold for the colored clusters (dashed gray) is determined from inspection of the number of clusters as a function of distance cutoff shown in B. Clusters centroids are labeled with articulatory features shared by leaf CVs. DT: dorsal tongue, CT: coronal tongue, BL: bilabial, LD: labiodental, S: sibilant, A: alveolar. B Number of clusters (vertical axis) as a function of the minimum cutoff distance between cluster centroids (horizontal axis). C Average predicted probability per CV for Subject 1. CVs are ordered from clustering analysis in A. D Accuracy of individual CVs. E Correlation between pairwise distances in deep network similarity space from C compared to distances in an articulatory/phonetic feature space for Major Articulator, Consonant Constriction Location, Consonant Constriction Degree, and Vowel, aggregated across all subjects. Center bar is the median and boundaries are 50% confidence intervals. Colored circles indicate subject medians. **p < 1 × 10−10, WSRT, *p < 1 × 10−4 t-test, both Bonferroni corrected with n = 4.
Fig 5
Fig 5. Hγ and β bands show diverse correlation structures across electrodes and CVs.
A-B Average amplitude as a function of frequency and time for an electrode with large activity during /ga/ production and for an electrode with no activity during /ga/ production. C and D Normalized (-1 to 1) Hγ (red) and β (black) activity from A and B respectively. Non-trivial temporal relationships can be seen in C which are not apparent in D. E The average correlation (± s.e.m.) between the Hγ amplitude and the single frequency amplitude is plotted as a function of frequency for each subject. Thickened region of the horizontal axis indicates the β frequency range. F Histogram of the Hγ-β correlation coefficients for all CVs and electrodes for Subject 1. G Histogram of the z-scored Hγ power near the CV acoustic transition (time = 0) for all CVs and electrodes for Subject 1.
Fig 6
Fig 6. Hγ and β bands show positive correlations at active electrodes which are not found in inactive electrodes for subjects with high classification accuracy.
A The trial-averaged Hγ-β correlation coefficient across electrodes and CVs is plotted against the average Hγ power near the CV acoustic transition for Subjects 1 and 4. Solid lines indicate the linear regression fit to the data with positive z-scored amplitude. The vertical dashed gray line indicates the division in average Hγ power between ‘active’ and ‘inactive’ electrodes for subjects 1 and 4. Data is summarized in nine bins plotted (± s.e.m.) per subject. B Same as A, but for Subjects 2 and 3, which have a much lower classification accuracy. C For the two subjects in A, the average (± s.e.m.) correlation is plotted between the Hγ amplitude and the single frequency amplitude as a function of frequency separately for active (white center line) and inactive (solid color) electrodes. Thickened region of the horizontal axis indicates the β frequency range. D Same as C for subjects in B.
Fig 7
Fig 7. Lower frequency bands do not contribute significant additional information to the CV classification task beyond Hγ.
A The average accuracy (± s.e.m., n = 10) normalized to chance (chance = 1, dashed blue line) is shown for each frequency band and subject. Subjects are left-right jittered to avoid visual overlap. The solid blue line is the mean across subjects for a single band. B Average change in accuracy (± s.e.m., n = 10) from Hγ accuracy normalized to chance when band’s features are concatenated with the Hγ features. The solid blue line is the mean across subjects for a single band. The Hγ accuracy cross-validation standard deviation (n = 10) normalized to chance is plotted above and below zero in the right-most column for each subject for comparison. C Average accuracy (± s.e.m., n = 10) normalized to chance (dashed blue line, chance = 1) plotted against the correlation coefficient between Hγ and the lower frequency band for active electrodes for each band and subject. The blue dashed line indicates chance accuracy. D Change in accuracy from Hγ accuracy normalized to chance plotted against the correlation coefficient between Hγ and the lower frequency band for active electrodes for each band and subject. The blue dashed line indicates no change in accuracy. **p < 0.001, *p < 0.01, WSRT, n.s., not significant. All Bonferroni corrected with n = 5.

References

    1. Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. Journal of Neuroscience. 2000;20(6):2315–2331. 10.1523/JNEUROSCI.20-06-02315.2000 - DOI - PMC - PubMed
    1. Carandini M, Demb JB, Mante V, Tolhurst DJ, Dan Y, Olshausen BA, et al. Do we know what the early visual system does? Journal of Neuroscience. 2005;25(46):10577–10597. 10.1523/JNEUROSCI.3726-05.2005 - DOI - PMC - PubMed
    1. Schwartz O, Pillow JW, Rust NC, Simoncelli EP. Spike-triggered neural characterization. Journal of vision. 2006;6(4):13–13. 10.1167/6.4.13 - DOI - PubMed
    1. Poggio T, Girosi F. Networks for approximation and learning. Proceedings of the IEEE. 1990;78(9):1481–1497. 10.1109/5.58326 - DOI
    1. Larochelle H, Bengio Y, Louradour J, Lamblin P. Exploring strategies for training deep neural networks. Journal of machine learning research. 2009;10(Jan):1–40.

Publication types