Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 5;33(3):691-708.
doi: 10.1093/cercor/bhac094.

Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech

Affiliations

Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech

Andrew W Corcoran et al. Cereb Cortex. .

Abstract

Online speech processing imposes significant computational demands on the listening brain, the underlying mechanisms of which remain poorly understood. Here, we exploit the perceptual "pop-out" phenomenon (i.e. the dramatic improvement of speech intelligibility after receiving information about speech content) to investigate the neurophysiological effects of prior expectations on degraded speech comprehension. We recorded electroencephalography (EEG) and pupillometry from 21 adults while they rated the clarity of noise-vocoded and sine-wave synthesized sentences. Pop-out was reliably elicited following visual presentation of the corresponding written sentence, but not following incongruent or neutral text. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that improvements in perceptual clarity were mediated via top-down signals that enhanced the quality of cortical speech representations. Spectral analysis further revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta-band power, alpha-band power, and pupil diameter were all increased following the provision of any written sentence information, irrespective of content. Together, these findings reveal distinctive profiles of neurophysiological activity that differentiate the content-specific processes associated with degraded speech comprehension from the context-specific processes invoked under adverse listening conditions.

Keywords: EEG; pop-out; predictive processing; speech comprehension; stimulus reconstruction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Experimental design and behavioral results. (A) Cochlear representations (see Methods for details) of 3.5 s of clear speech (left), SWS (middle) and NVS (right). (B) In each trial, participants listened to 2 repetitions of the same noisy speech. The 2 presentations of the stimuli were interleaved with either (i) the corresponding written sentence (correct prior; condition P+), (ii) a different sentence (incorrect prior; condition P–), or (iii) hash symbols (no prior; condition P0). Following each presentation of the stimulus, participants were asked to indicate the subjective clarity of the stimulus they heard. EEG was recorded throughout the task. (C) Clarity ratings for the SWS (left, circles) and NVS (right, diamonds) stimuli. Participants were asked to rate the stimuli after the 1st (unfilled circles and diamonds) and 2nd (filled circles and diamonds) presentations. Clarity ratings are averaged for each stimulus type and prior condition (P+: green; P–: orange; P0: purple). Individual data-points are shown with small circles (SWS) and diamonds (NVS). The 2 average ratings of each participant and each category are connected with a continuous line if it increases from the 1st to 2nd presentation and a dashed line if it decreases. Large circles and diamonds show the average across the sample (n = 19 participants) and error bars show the standard error of the mean (SEM) across participants. Stars indicate the significance levels of posthoc contrasts across condition levels (marginalized over stimulus type; ***: P < 0.001, **: P < 0.01, *: P < 0.05).
Fig. 2
Fig. 2
Correct priors improve stimulus reconstruction. (A) The envelope of noisy speech was reconstructed from EEG recordings (n = 19 participants, see Methods) and a stimulus reconstruction score was computed for the first 3.5 s (first iteration of the sentence) of each stimulus presentation (1st: unfilled markers; 2nd: filled markers) and for the SWS (left, circles) and NVS (right, diamonds) stimuli separately. Reconstruction scores are averaged for each stimulus type and prior condition (P+: green; P–: orange; P0: Purple). Individual data-points are shown with small circles (SWS) and diamonds (NVS). The 2 average ratings of each participant and each category are connected with a continuous line if it increases from the 1st to 2nd presentation and a dashed line if it decreases. Large circles and diamonds show the average across the sample (n = 19 participants) and error bars show the SEM across participants. Stars indicate the significance levels of posthoc contrasts across condition levels (***: P < 0.001, **: P < 0.01, *: P < 0.05). (B) Correlation between clarity ratings and reconstruction scores on the 2nd presentation for SWS (left, circles) and NVS (right, diamonds). Individual data-points are shown with small circles (SWS) and diamonds (NVS). Large circles and diamonds show the average across the sample (n = 19 participants) and error bars show the SEM across participants. The Pearson’s correlation coefficient computed across conditions for the SWS and NVS is shown on each graph along with the associated P-value.
Fig. 3
Fig. 3
Time–frequency analysis and mixed-effects modeling. (A) Time-frequency representation depicting grand-average power over the course of the 2nd sentence presentation following the provision of correct (P+), incorrect (P–), or no (P0) written sentence information (averaged across stimulus types). Each presentation comprised 3 iterations of the same noisy stimulus (~3.5 s each). Spectral power estimates from each frequency bin were baseline-corrected using the mean power estimate from the corresponding frequency bin averaged over all time bins spanning the 1st presentation period. (B) Topographic distribution of electrodes’ involvement in the clusters identified via cluster-based permutation analysis of the first sentence iteration. Scale indicates probability of electrode inclusion (i.e. the proportion of times an electrode was included within the cluster) within the 10–15 Hz range used to define the alpha-band. These plots indicate that significant clusters were predominantly composed of electrodes over posterior scalp regions for P+ vs. P0 contrasts, and more broadly distributed for the SWS P– vs. P0 contrast. (C) Visualization of linear mixed-effects model predictions for delta (1–3 Hz), theta (4–9 Hz), and alpha (10–15 Hz) power during the first sentence iteration for each prior condition (P+: green; P–: orange; P0: purple; marginalized over stimulus type). Individual data-points are shown with small circles. Large circles show the estimated marginal means for the prior condition across the sample (n = 19 participants); error bars show the SEM across participants. Stars indicate the significance levels of posthoc contrasts across condition levels (***: P < 0.001, **: P < 0.01, *: P < 0.05). Note, estimates have been mean-centered for the purposes of visualization.
Fig. 4
Fig. 4
Expectations modulates alpha power and pupil size. Temporal dynamics of induced alpha power (A) and pupil size (B) over the course of the 2nd presentation period for each stimulus type (top: SWS; bottom: NVS) and prior condition (P+: green; P–: orange; P0: purple). Alpha power is averaged over parieto-occipital electrodes (see black dots on the inset) and expressed as log10 units. Pupil size is averaged across the two eyes and expressed in arbitrary units. Error shades show the SEM across participants (n = 19 participants for alpha power and n = 17 for pupil, see Methods). Horizontal bars show the clusters of times showing significant differences (cluster-permutation, P < 0.05, see Methods) between the P+ and P0 conditions (green), and P– and P0 conditions (orange).

References

    1. Ahissar E, Nagarajan S, Ahissar M, Protopapas A, Mahncke H, Merzenich MM. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc Natl Acad Sci. 2001:98:13367–13372. - PMC - PubMed
    1. Alday PM. How much baseline correction do we need in ERP research? Extended GLM model can replace baseline correction while lifting its limits. Psychophysiology. 2019:56:e13451. - PubMed
    1. Alhanbali S, Dawes P, Millman RE, Munro KJ. Measures of listening effort are multidimensional. Ear Hear. 2019:40:1084–1097. - PMC - PubMed
    1. Arnal LH, Giraud A-L. Cortical oscillations and sensory predictions. Trends Cogn Sci. 2012:16:390–398. - PubMed
    1. Arnal LH, Doelling KB, Poeppel D. Delta–beta coupled oscillations underlie temporal prediction accuracy. Cereb Cortex. 2015:25:3077–3085. - PMC - PubMed

Publication types