Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 30;20(1):e0318600.
doi: 10.1371/journal.pone.0318600. eCollection 2025.

Hearing in categories and speech perception at the "cocktail party"

Affiliations

Hearing in categories and speech perception at the "cocktail party"

Gavin M Bidelman et al. PLoS One. .

Abstract

We aimed to test whether hearing speech in phonetic categories (as opposed to a continuous/gradient fashion) affords benefits to "cocktail party" speech perception. We measured speech perception performance (recognition, localization, and source monitoring) in a simulated 3D cocktail party environment. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (1-4 talkers) and via forward vs. time-reversed maskers, the latter promoting a release from masking. In separate tasks, we measured isolated phoneme categorization using two-alternative forced choice (2AFC) and visual analog scaling (VAS) tasks designed to promote more/less categorical hearing and thus test putative links between categorization and real-world speech-in-noise skills. We first show cocktail party speech recognition accuracy and speed decline with additional competing talkers and amidst forward compared to reverse maskers. Dividing listeners into "discrete" vs. "continuous" categorizers based on their VAS labeling (i.e., whether responses were binary or continuous judgments), we then show the degree of release from masking experienced at the cocktail party is predicted by their degree of categoricity in phoneme labeling and not high-frequency audiometric thresholds; more discrete listeners make less effective use of time-reversal and show less release from masking than their gradient responding peers. Our results suggest a link between speech categorization skills and cocktail party processing, with a gradient (rather than discrete) listening strategy benefiting degraded speech perception. These findings suggest that less flexibility in binning sounds into categories may be one factor that contributes to figure-ground deficits.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Cocktail party cocktail party task.
(a) Participants were seated in the center of a 16-ch speaker array within an anechoic chamber. Speaker heights were positioned at ear level (~130 cm) during the task with a radial distance of 160 cm to the center of the head and speaker-to-speaker distance of ~200. (b) Example stimulus presentation (2 and 4 masker talker conditions). Participants were asked to recall the color, number, and perceived location of target callsign sentences from the CRM corpus [68]. Target location was varied randomly from trial to trial and occurred simultaneously with between 0 and 4 concurrent talkers presented in either forward or time-reversed directions. (c) Example trial time course. After presentation of CRM sentences, listeners recalled the color-number combination of the target talker, its perceived location in the hemifield, and how many talkers they heard in the soundscape.
Fig 2
Fig 2. Extended high frequency (EHF) hearing thresholds.
Audiograms for left (LE) and right (RE) ears. Pure-tone average (PTA) EHF thresholds in the normal and EHF (9–20 kHz; yellow highlight) frequency range were well within normal hearing limits. errorbars = ± 1 s.e.m.
Fig 3
Fig 3. Cocktail party listening performance.
(a) Speech recognition declines with increasing masker counts but is much poorer under informational/linguistic vs. purely energic masking (cf., forward vs. reverse masker directions). Dotted line = chance performance. (b) Owing to their added linguistic interference, forward maskers yield slower recognition speeds than reverse maskers. (c) Listeners localized targets within 2 speakers (40-60O error) with better localization during purely energetic masking. (d) Source monitoring. Listeners saturate in source monitoring and only report hearing up to ~3 additional talkers despite up to 5 in the soundscape. errorbars = ± 1 s.e.m., ***p<0.0001.
Fig 4
Fig 4. Stimulus- and task-dependent changes in the strength of perceptual categorization.
Speech categorization and RT speeds under (a-b) 2AFC and (c-d) VAS labeling tasks. Note the sharper, more discrete categorization for CVs compared to vowels in the 2AFC (but not VAS) condition. RTs show the typical slowing near the perceptually ambiguous midpoint of the vowel (but not CV) continuum for both tasks. VAS responses were 750 ms slower than 2AFC across the board. RTs are plotted normalized to the global mean to highlight token- and stimulus-related changes. Identification slopes reflect sqrt[abs(X—mean(X))] transformed values. errorbars = ± 1 s.e.m., *p<0.05.
Fig 5
Fig 5. VAS ratings reveal stark individual differences in categorization and “continuous” vs. “categorial” listeners.
Individual histograms show the distribution of each listener’s phonetic labeling for CV and vowel sounds. Discrete (categorical) listeners produce more binary categorization where responses lump near endpoint tokens (e.g., S2). In contrast, continuous (gradient) listeners tend to hear the continuum in a gradient fashion (e.g., S16). Inset values show Hartigan’s Dip statistic [99] score, quantifying the bimodality—and thus categoricity—of each distribution. Higher dip values = discrete categorization; low values = continuous categorization. (inset) Dip values are similar between CV and vowels suggesting it is a reliable measure of listener strategy that is independent of speech material. errorbars = ± 1 s.e.m.
Fig 6
Fig 6. Gradient listeners are less susceptible to speech interference at the “cocktail party”.
(a) Speech recognition performance in the cocktail party task for discrete and continuous listeners. Listener strategy was determined via Hartigan’s dip statistic [99] applied to VAS labeling (i.e., Fig 5) to identify individuals with bimodal (categorical) vs. unimodal (continuous) response distributions. Release from masking was measured as the difference in recognition performance between forward and reverse masker conditions at each masker count. (b) Discrete/categorical listeners show less masking release during speech cocktail party than their continuous listener peers. errorbars = ± 1 s.e.m.; shading = 95% CI; *p<0.05.

Similar articles

References

    1. Goldstone RL, Hendrickson AT. Categorical perception. Wiley Interdiscip Rev Cogn Sci. 2010;1(1):69–78. Epub 20091223. doi: 10.1002/wcs.26 . - DOI - PubMed
    1. Beale JM, Keil FC. Categorical effects in the perception of faces. Cognition. 1995;57(3):217–39. Epub 1995/12/01. doi: 10.1016/0010-0277(95)00669-x . - DOI - PubMed
    1. Franklin A, Drivonikou GV, Clifford A, Kay P, Regier T, Davies IR. Lateralization of categorical perception of color changes with color term acquisition. Proc Natl Acad Sci U S A. 2008;105(47):18221–5. Epub 2008/11/19. doi: 10.1073/pnas.0809952105 - DOI - PMC - PubMed
    1. Klein ME, Zatorre RJ. A role for the right superior temporal sulcus in categorical perception of musical chords. Neuropsychologia. 2011;49(5):878–87. Epub 2011/01/18. doi: 10.1016/j.neuropsychologia.2011.01.008 . - DOI - PubMed
    1. Siegel JA, Siegel W. Absolute identification of notes and intervals by musicians. Percept Psychophys. 1977;21(2):143–52.

LinkOut - more resources