. 2025 Jan 30;20(1):e0318600.

doi: 10.1371/journal.pone.0318600. eCollection 2025.

Hearing in categories and speech perception at the "cocktail party"

Gavin M Bidelman^{1

2

3}, Fallon Bernard⁴, Kimberly Skubic⁴

Affiliations

¹ Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana, United States of America.
² Program in Neuroscience, Indiana University, Bloomington, Indiana, United States of America.
³ Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America.
⁴ School of Communication Sciences & Disorders, University of Memphis, Memphis, Tennessee, United States of America.

PMID: 39883695
PMCID: PMC11781644
DOI: 10.1371/journal.pone.0318600

Hearing in categories and speech perception at the "cocktail party"

Gavin M Bidelman et al. PLoS One. 2025.

. 2025 Jan 30;20(1):e0318600.

doi: 10.1371/journal.pone.0318600. eCollection 2025.

Authors

Gavin M Bidelman^{1

2

3}, Fallon Bernard⁴, Kimberly Skubic⁴

Affiliations

¹ Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana, United States of America.
² Program in Neuroscience, Indiana University, Bloomington, Indiana, United States of America.
³ Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America.
⁴ School of Communication Sciences & Disorders, University of Memphis, Memphis, Tennessee, United States of America.

PMID: 39883695
PMCID: PMC11781644
DOI: 10.1371/journal.pone.0318600

Abstract

We aimed to test whether hearing speech in phonetic categories (as opposed to a continuous/gradient fashion) affords benefits to "cocktail party" speech perception. We measured speech perception performance (recognition, localization, and source monitoring) in a simulated 3D cocktail party environment. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (1-4 talkers) and via forward vs. time-reversed maskers, the latter promoting a release from masking. In separate tasks, we measured isolated phoneme categorization using two-alternative forced choice (2AFC) and visual analog scaling (VAS) tasks designed to promote more/less categorical hearing and thus test putative links between categorization and real-world speech-in-noise skills. We first show cocktail party speech recognition accuracy and speed decline with additional competing talkers and amidst forward compared to reverse maskers. Dividing listeners into "discrete" vs. "continuous" categorizers based on their VAS labeling (i.e., whether responses were binary or continuous judgments), we then show the degree of release from masking experienced at the cocktail party is predicted by their degree of categoricity in phoneme labeling and not high-frequency audiometric thresholds; more discrete listeners make less effective use of time-reversal and show less release from masking than their gradient responding peers. Our results suggest a link between speech categorization skills and cocktail party processing, with a gradient (rather than discrete) listening strategy benefiting degraded speech perception. These findings suggest that less flexibility in binning sounds into categories may be one factor that contributes to figure-ground deficits.

Copyright: © 2025 Bidelman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Cocktail party cocktail party task.**
(a) Participants were seated in the center of a 16-ch speaker array within an anechoic chamber. Speaker heights were positioned at ear level (~130 cm) during the task with a radial distance of 160 cm to the center of the head and speaker-to-speaker distance of ~20⁰. **(b)** Example stimulus presentation (2 and 4 masker talker conditions). Participants were asked to recall the color, number, and perceived location of target callsign sentences from the CRM corpus [68]. Target location was varied randomly from trial to trial and occurred simultaneously with between 0 and 4 concurrent talkers presented in either forward or time-reversed directions. (c) Example trial time course. After presentation of CRM sentences, listeners recalled the color-number combination of the target talker, its perceived location in the hemifield, and how many talkers they heard in the soundscape.

**Fig 2. Extended high frequency (EHF) hearing thresholds.**
Audiograms for left (LE) and right (RE) ears. Pure-tone average (PTA) EHF thresholds in the normal and EHF (9–20 kHz; yellow highlight) frequency range were well within normal hearing limits. errorbars = ± 1 s.e.m.

**Fig 3. Cocktail party listening performance.**
(a) Speech recognition declines with increasing masker counts but is much poorer under informational/linguistic vs. purely energic masking (cf., forward vs. reverse masker directions). Dotted line = chance performance. (b) Owing to their added linguistic interference, forward maskers yield slower recognition speeds than reverse maskers. (c) Listeners localized targets within 2 speakers (40-60^O error) with better localization during purely energetic masking. (d) Source monitoring. Listeners saturate in source monitoring and only report hearing up to ~3 additional talkers despite up to 5 in the soundscape. errorbars = ± 1 s.e.m., ***p<0.0001.

**Fig 4. Stimulus- and task-dependent changes in the strength of perceptual categorization.**
Speech categorization and RT speeds under (**a-b**) 2AFC and (**c-d**) VAS labeling tasks. Note the sharper, more discrete categorization for CVs compared to vowels in the 2AFC (but not VAS) condition. RTs show the typical slowing near the perceptually ambiguous midpoint of the vowel (but not CV) continuum for both tasks. VAS responses were 750 ms slower than 2AFC across the board. RTs are plotted normalized to the global mean to highlight token- and stimulus-related changes. Identification slopes reflect sqrt[abs(X—mean(X))] transformed values. errorbars = ± 1 s.e.m., *p<0.05.

**Fig 5. VAS ratings reveal stark individual differences in categorization and “continuous” vs. “categorial” listeners.**
Individual histograms show the distribution of each listener’s phonetic labeling for CV and vowel sounds. Discrete (categorical) listeners produce more binary categorization where responses lump near endpoint tokens (e.g., S2). In contrast, continuous (gradient) listeners tend to hear the continuum in a gradient fashion (e.g., S16). Inset values show Hartigan’s Dip statistic [99] score, quantifying the bimodality—and thus categoricity—of each distribution. Higher dip values = discrete categorization; low values = continuous categorization. (inset) Dip values are similar between CV and vowels suggesting it is a reliable measure of listener strategy that is independent of speech material. errorbars = ± 1 s.e.m.

**Fig 6. Gradient listeners are less susceptible to speech interference at the “cocktail party”.**
**(a)** Speech recognition performance in the cocktail party task for discrete and continuous listeners. Listener strategy was determined via Hartigan’s dip statistic [99] applied to VAS labeling (i.e., Fig 5) to identify individuals with bimodal (categorical) vs. unimodal (continuous) response distributions. Release from masking was measured as the difference in recognition performance between forward and reverse masker conditions at each masker count. (b) Discrete/categorical listeners show less masking release during speech cocktail party than their continuous listener peers. errorbars = ± 1 s.e.m.; shading = 95% CI; *p<0.05.

See this image and copyright information in PMC

References

1. Goldstone RL, Hendrickson AT. Categorical perception. Wiley Interdiscip Rev Cogn Sci. 2010;1(1):69–78. Epub 20091223. doi: 10.1002/wcs.26 . - DOI - PubMed
1. Beale JM, Keil FC. Categorical effects in the perception of faces. Cognition. 1995;57(3):217–39. Epub 1995/12/01. doi: 10.1016/0010-0277(95)00669-x . - DOI - PubMed
1. Franklin A, Drivonikou GV, Clifford A, Kay P, Regier T, Davies IR. Lateralization of categorical perception of color changes with color term acquisition. Proc Natl Acad Sci U S A. 2008;105(47):18221–5. Epub 2008/11/19. doi: 10.1073/pnas.0809952105 - DOI - PMC - PubMed
1. Klein ME, Zatorre RJ. A role for the right superior temporal sulcus in categorical perception of musical chords. Neuropsychologia. 2011;49(5):878–87. Epub 2011/01/18. doi: 10.1016/j.neuropsychologia.2011.01.008 . - DOI - PubMed
1. Siegel JA, Siegel W. Absolute identification of notes and intervals by musicians. Percept Psychophys. 1977;21(2):143–52.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 DC016267/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Hearing in categories and speech perception at the "cocktail party"

Affiliations

Hearing in categories and speech perception at the "cocktail party"

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources