. 2017 Apr 26;7(6):e00665.

doi: 10.1002/brb3.665. eCollection 2017 Jun.

Vowel decoding from single-trial speech-evoked electrophysiological responses: A feature-based machine learning approach

Han G Yi¹, Zilong Xie¹, Rachel Reetzke¹, Alexandros G Dimakis², Bharath Chandrasekaran^{1

3

4

5

6}

Affiliations

¹ Department of Communication Sciences & Disorders Moody College of Communication The University of Texas at Austin Austin TX USA.
² Department of Electrical and Computer Engineering Cockrell School of Engineering The University of Texas at Austin Austin TX USA.
³ Department of Psychology College of Liberal Arts The University of Texas at Austin Austin TX USA.
⁴ Department of Linguistics College of Liberal Arts The University of Texas at Austin Austin TX USA.
⁵ Institute of Mental Health Research College of Liberal Arts The University of Texas at Austin Austin TX USA.
⁶ Institute for Neuroscience College of Liberal Arts The University of Texas at Austin Austin TX USA.

PMID: 28638700
PMCID: PMC5474698
DOI: 10.1002/brb3.665

Vowel decoding from single-trial speech-evoked electrophysiological responses: A feature-based machine learning approach

Han G Yi et al. Brain Behav. 2017.

. 2017 Apr 26;7(6):e00665.

doi: 10.1002/brb3.665. eCollection 2017 Jun.

Authors

Han G Yi¹, Zilong Xie¹, Rachel Reetzke¹, Alexandros G Dimakis², Bharath Chandrasekaran^{1

3

4

5

6}

Affiliations

¹ Department of Communication Sciences & Disorders Moody College of Communication The University of Texas at Austin Austin TX USA.
² Department of Electrical and Computer Engineering Cockrell School of Engineering The University of Texas at Austin Austin TX USA.
³ Department of Psychology College of Liberal Arts The University of Texas at Austin Austin TX USA.
⁴ Department of Linguistics College of Liberal Arts The University of Texas at Austin Austin TX USA.
⁵ Institute of Mental Health Research College of Liberal Arts The University of Texas at Austin Austin TX USA.
⁶ Institute for Neuroscience College of Liberal Arts The University of Texas at Austin Austin TX USA.

PMID: 28638700
PMCID: PMC5474698
DOI: 10.1002/brb3.665

Abstract

Introduction: Scalp-recorded electrophysiological responses to complex, periodic auditory signals reflect phase-locked activity from neural ensembles within the auditory system. These responses, referred to as frequency-following responses (FFRs), have been widely utilized to index typical and atypical representation of speech signals in the auditory system. One of the major limitations in FFR is the low signal-to-noise ratio at the level of single trials. For this reason, the analysis relies on averaging across thousands of trials. The ability to examine the quality of single-trial FFRs will allow investigation of trial-by-trial dynamics of the FFR, which has been impossible due to the averaging approach.

Methods: In a novel, data-driven approach, we used machine learning principles to decode information related to the speech signal from single trial FFRs. FFRs were collected from participants while they listened to two vowels produced by two speakers. Scalp-recorded electrophysiological responses were projected onto a low-dimensional spectral feature space independently derived from the same two vowels produced by 40 speakers, which were not presented to the participants. A novel supervised machine learning classifier was trained to discriminate vowel tokens on a subset of FFRs from each participant, and tested on the remaining subset.

Results: We demonstrate reliable decoding of speech signals at the level of single-trials by decomposing the raw FFR based on information-bearing spectral features in the speech signal that were independently derived.

Conclusions: Taken together, the ability to extract interpretable features at the level of single-trials in a data-driven manner offers unchartered possibilities in the noninvasive assessment of human auditory function.

Keywords: EEG; frequency‐following responses; speech decoding; vowels.

PubMed Disclaimer

Figures

**Figure 1**
(a) Spectra for [æ] and [u] vowels produced by two male native speakers of English. The x‐axis codes frequency ranging from 0 to 4 kHz, in 4‐Hz steps. The y‐axis codes relative amplitude at each spectral bin, which has been scaled by the standard deviation of each of the four sound files. (b) Spectra for the frequency following responses collected from 25 participants, which were averaged across 1,000 trials. The x‐ and y‐axes are identical to those used in (a). (c) Overlaying the two sets of spectra reveals spectral similarity across the stimuli and the responses within each speech token

**Figure 3**
(a) Training‐test scheme for vowel (N = 2) decoding. For each participant, a classifier was trained to identify the [æ] and [u] labels from each trial, based on the 12 spectral features. Then, the trained classifier was tested on an independent subset. The resulting prediction vector included values pertaining to the probability of each vowel. In this particular example from a representative participant, the classifier outputs reasonably accurate responses for [æ]₁ and [u]₂, but not for [æ]₂ and [u]₁. (b) Training‐test scheme for stimulus (N = 4) decoding. For each participant, a classifier was trained to identify for [æ]₁, [æ]₂, and [u]₁, and [u]₂ labels from each trial, based on the 12 spectral features. Then, the trained classifier was tested on an independent subset. The resulting prediction vector included values pertaining to the probability of each of the four stimuli. In this particular example from a representative participant, the classifier outputs reasonably accurate responses for [æ]₁ and [æ]₂, but not for [u]₁ and [u]₂. (c) Based on the aforementioned probability vectors, a receiver operating characteristics (ROC) curve was generated. The area under the curve (AUC) measure served as a metric of decoding performance. Note that for stimulus decoding, the ROC curve was constructed separately for each stimulus per a one‐versus‐all scheme

**Figure 4**
(a) Area under the curve (AUC) measures are displayed for vowel (mean = 0.67; SD = 0.15; median = 0.67) and stimulus (mean = 0.73; SD = 0.09; median = 0.71) decoding. In this box plot, the dark centerlines correspond to the median, while the top and bottom edges of the boxes correspond to the 25th and 75th percentiles across the 25 participants, respectively. Note that stimulus decoding AUC is averaged across individual one‐versus‐all AUC calculated from each of the four stimuli, the chance level therefore corresponding to 0.50 rather than 0.25. (b) Vowel and stimulus decoding AUC across different sizes of the training set. The x‐axis corresponds to the trials per each of the four stimuli (from 50 to 950; step size of 50 trials) that were included as a part of the subset included in training of the classifier. Note that the test set always consisted of the 50 trials per stimulus that immediately followed the training set

**Figure 5**
(a) Importance of spectral features during vowel and stimulus decoding (950‐trial training set). The x‐axis corresponds to the 12 principal components (PCs) that were used as input features for the classifier. The y‐axis corresponds to the percentage of times in which the feature was used by a given decision tree. (b) The top four PCs in the frequency domain. In PC1, which was disproportionately used by the classifiers, three extrema are readily identifiable (arrows). (c) Log‐transformed spectra of the original stimuli (left; black lines) and the grand average frequency following response (right; red lines) are displayed. Three formant frequencies are identifiable (arrows), which also correspond to the three extrema of the PC1 marked with arrows in (b)

See this image and copyright information in PMC

References

1. Aiken, S. J. , & Picton, T. W. (2008). Envelope and spectral frequency‐following responses to vowel sounds. Hearing Research, 245, 35–47. - PubMed
1. Anderson, L. A. , & Malmierca, M. S. (2013). The effect of auditory cortex deactivation on stimulus‐specific adaptation in the inferior colliculus of the rat. European Journal of Neuroscience, 37, 52–62. - PubMed
1. Banai, K. , Abrams, D. , & Kraus, N. (2007). Sensory‐based learning disability: Insights from brainstem processing of speech sounds. International Journal of Audiology, 46, 524–532. - PubMed
1. Bidelman, G. M. (2014). Objective information‐theoretic algorithm for detecting brainstem‐evoked responses to complex stimuli. Journal of the American Academy of Audiology, 25, 715–726. - PubMed
1. Bidelman, G. M. , Moreno, S. , & Alain, C. (2013). Tracing the emergence of categorical speech perception in the human auditory system. NeuroImage, 79, 201–212. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 DC013315/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Vowel decoding from single-trial speech-evoked electrophysiological responses: A feature-based machine learning approach

Affiliations

Vowel decoding from single-trial speech-evoked electrophysiological responses: A feature-based machine learning approach

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources