Interference of Mid-level Speech and Noise Statistics Underlies Human Speech Recognition Sensitivity in Natural Environmental Noise
- PMID: 40628526
- PMCID: PMC12330335
- DOI: 10.1523/JNEUROSCI.1751-24.2025
Interference of Mid-level Speech and Noise Statistics Underlies Human Speech Recognition Sensitivity in Natural Environmental Noise
Abstract
Recognizing speech in noise, such as in a busy restaurant, is an essential cognitive skill where the task difficulty varies across environments and noise levels. Although there is growing evidence that the auditory system relies on statistical representations for perceiving and coding natural sounds, it is less clear how statistical cues and neural representations contribute to segregating speech in natural auditory scenes. Here we demonstrate that male and female human listeners rely on mid-level statistics to segregate and recognize speech in environmental noise. Using natural backgrounds and variants with perturbed spectrotemporal statistics, we show that speech recognition accuracy at a fixed noise level varies extensively across natural backgrounds (0-100%). Furthermore, for each background the unique interference created by summary statistics can mask or unmask speech, thus hindering or improving speech recognition. To identify the neural coding strategy and statistical cues that influence accuracy, we developed generalized perceptual regression, a framework that links summary statistics from a neural model to word recognition accuracy. Whereas summary statistics from a peripheral cochlear model account for only 60% of perceptual variance, summary statistics from a mid-level auditory midbrain model accurately predict single-trial sensory judgments, accounting for >90% of the perceptual variance. Furthermore, perceptual weights from the regression framework identify which statistics and tuned neural filters are influential and how they impact recognition. Thus, perception of speech in natural backgrounds relies on a mid-level auditory representation involving interference of multiple summary statistics that impact recognition beneficially or detrimentally across natural background sounds.
Keywords: auditory midbrain; cocktail party problem; natural sounds; neural network; sound statistics; speech in noise; speech recognition.
Copyright © 2025 the authors.
Update of
-
Interference of mid-level sound statistics underlie human speech recognition sensitivity in natural noise.bioRxiv [Preprint]. 2024 Oct 4:2024.02.13.579526. doi: 10.1101/2024.02.13.579526. bioRxiv. 2024. Update in: J Neurosci. 2025 Aug 6;45(32):e1751242025. doi: 10.1523/JNEUROSCI.1751-24.2025. PMID: 38405870 Free PMC article. Updated. Preprint.
References
-
- Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 5:975–979. 10.1121/1.1907229 - DOI
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical