Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

Lei Wang^{1

2}, Ed X Wu², Fei Chen¹

Affiliations

¹ Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.
² Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong.

PMID: 33132874
PMCID: PMC7576187
DOI: 10.3389/fnhum.2020.557534

Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

Lei Wang et al. Front Hum Neurosci. 2020.

. 2020 Oct 7:14:557534.

doi: 10.3389/fnhum.2020.557534. eCollection 2020.

Authors

Lei Wang^{1

2}, Ed X Wu², Fei Chen¹

Affiliations

¹ Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.
² Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong.

PMID: 33132874
PMCID: PMC7576187
DOI: 10.3389/fnhum.2020.557534

Abstract

The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.

Keywords: EEG; auditory attention decoding; signal-to-noise ratio; speech RMS-level segments; temporal response function (TRF).

PubMed Disclaimer

Figures

**FIGURE 1**
**(A)** Example segments of relative root-mean-square (RMS) energy representations. The dashed line shows the boundaries of the high-RMS-level region. **(B)** The waveform of the original sentence. **(C)** The sentence containing only high-RMS-level segments.

**FIGURE 2**
**(A)** Topological distributions of TRF responses at P1_TRF, N1_TRF, and P2_TRF components with intact temporal envelopes (left) and high-RMS-level segments (right). The electrodes marked as black dots are used to conduct further analyses. **(B)** Grand-averaged estimated temporal response function (TRF) responses with intact temporal amplitude envelopes (left) and high-RMS-level-only envelopes (right) in five SNR conditions.

**FIGURE 3**
**(A)** Statistical results (mean ± standard deviation) for TRF values with intact and high-RMS-level-only envelopes at three typical deflections across subjects. **(B)** Statistical results (mean ± standard deviation) for TRF latencies with intact and high-RMS-level-only envelopes at three typical deflections across subjects in five SNR conditions. ***P < 0.001, **P < 0.01 (prediction difference). n.s., no significant difference (analysis of variance). SNR, signal-to-noise ratio.

**FIGURE 4**
**(A)** Speech envelope prediction correlations under various signal-to-noise ratio (SNR) conditions with the band-pass filter at 2–8, 8–15, and 15–30 Hz. Error bars display mean ± standard deviation. **(B)** Speech envelope prediction correlations under various signal-to-noise ratio (SNR) conditions for attended and ignored streams at the 2–8 Hz frequency band. *P < 0.05, ***P < 0.001 (prediction differences). n.s., no significant difference (analysis of variance).

**FIGURE 5**
**(A)** Average auditory attention-decoding accuracy across subjects with the intact temporal envelopes and high-RMS-level segments for each signal-to-noise ratio (SNR) with the duration of decoding window at 60, 30, 10, and 2 s. Error bars display mean ± standard deviation. ***P < 0.001. n.s., no significant difference (analysis of variance). **(B)** Scatter plots of the correlation of attended vs. ignored streams across all trials and subjects for each signal-to-noise ratio (SNR) with 60 s decoding window length. Points above the blue dashed lines indicate the correct identification of attended speech.

**FIGURE 6**
Average percentages of correct responses to questions related to the content of attended speech at each signal-to-noise ratio. Error bars display mean ± standard deviation **P < 0.01, ***P < 0.001 (prediction differences).

See this image and copyright information in PMC

Cited by

Auditory Attention Detection via Cross-Modal Attention.
Cai S, Li P, Su E, Xie L. Cai S, et al. Front Neurosci. 2021 Jul 21;15:652058. doi: 10.3389/fnins.2021.652058. eCollection 2021. Front Neurosci. 2021. PMID: 34366770 Free PMC article.
A Speech-Level-Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes.
Wang L, Wang Y, Liu Z, Wu EX, Chen F. Wang L, et al. Front Neurosci. 2022 Feb 10;15:760611. doi: 10.3389/fnins.2021.760611. eCollection 2021. Front Neurosci. 2022. PMID: 35221885 Free PMC article.
A Brain-Computer Interface for Improving Auditory Attention in Multi-Talker Environments.
Haro S, Beauchene C, Quatieri TF, Smalt CJ. Haro S, et al. bioRxiv [Preprint]. 2025 Mar 13:2025.03.13.641661. doi: 10.1101/2025.03.13.641661. bioRxiv. 2025. PMID: 40161643 Free PMC article. Preprint.

References

1. Alickovic E., Lunner T., Gustafsson F., Ljung L. (2019). A tutorial on auditory attention identification methods. Front. Neurosci. 13:153. 10.3389/fnins.2019.00153 - DOI - PMC - PubMed
1. Aroudi A., Mirkovic B., De Vos M., Doclo S. (2019). Impact of different acoustic components on EEG-based auditory attention decoding in noisy and reverberant conditions. IEEE Trans. Neural Syst. Rehabil. Eng. 27 652–663. 10.1109/tnsre.2019.2903404 - DOI - PubMed
1. Biesmans W., Das N., Francart T., Bertrand A. (2016). Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario. IEEE Trans. Neural Syst. Rehabil. Eng. 25 402–412. 10.1109/tnsre.2016.2571900 - DOI - PubMed
1. Broderick M. P., Anderson A. J., Lalor E. C. (2019). Semantic context enhances the early auditory encoding of natural speech. J. Neurosci. 39 7564–7575. 10.1523/jneurosci.0584-19.2019 - DOI - PMC - PubMed
1. Chen F., Loizou P. C. (2011). Predicting the intelligibility of vocoded and wideband Mandarin Chinese. J. Acoust. Soc. Am. 129 3281–3290. 10.1121/1.3570957 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

Affiliations

Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources