Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul;138(1):33-43.
doi: 10.1121/1.4922224.

Segregating two simultaneous sounds in elevation using temporal envelope: Human psychophysics and a physiological model

Affiliations

Segregating two simultaneous sounds in elevation using temporal envelope: Human psychophysics and a physiological model

Jeffrey S Johnson et al. J Acoust Soc Am. 2015 Jul.

Abstract

The ability to segregate simultaneous sound sources based on their spatial locations is an important aspect of auditory scene analysis. While the role of sound azimuth in segregation is well studied, the contribution of sound elevation remains unknown. Although previous studies in humans suggest that elevation cues alone are not sufficient to segregate simultaneous broadband sources, the current study demonstrates they can suffice. Listeners segregating a temporally modulated noise target from a simultaneous unmodulated noise distracter differing in elevation fall into two statistically distinct groups: one that identifies target direction accurately across a wide range of modulation frequencies (MF) and one that cannot identify target direction accurately and, on average, reports the opposite direction of the target for low MF. A non-spiking model of inferior colliculus neurons that process single-source elevation cues suggests that the performance of both listener groups at the population level can be accounted for by the balance of excitatory and inhibitory inputs in the model. These results establish the potential for broadband elevation cues to contribute to the computations underlying sound source segregation and suggest a potential mechanism underlying this contribution.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
(a) Accuracy on sound segregation task. Black open circles (line omitted) are the mean across all nine listeners in the AM-alone task. Error bars are standard deviation. Gray lines indicate accuracy on the AM + masker task for veridical responders. Black lines indicate accuracy on the AM + masker task for non-veridical responders. Dashed lines indicate individual performance, solid lines indicate mean performance, and filled circles indicate performance significantly different from chance (0.5, dashed black line) using a binomial test, p < 0.05, Bonferroni corrected for ten comparisons. Actual p-values (5–2000 Hz) for the veridical group are: 0, 0, 0, 0, 0, 0, 4.2 × 10−9, 8.9 × 10−3, 0.30. 0.94; actual p-values for the non-veridical group are: 2.9 × 10−4, 4.5 × 10−3, 1.7 × 10−6, 0.10, 0.10, 8.7 × 10−3, 0.14, 0.44, 0.48, 0.78. (b) Dendrogram of cluster analysis of accuracy functions from AM + masker task. Gray lines indicate veridical responders. Black lines indicate non-veridical responders.
FIG. 2.
FIG. 2.
Model circuit diagram. Connections between the six cell types modeled are shown as arrows (excitatory) and circles (inhibitory). The horizontal black bar corresponds to auditory nerve fiber inputs. Inputs to WBE and WBI cells come across large frequency ranges illustrated with shaded areas. Inputs to type-IV, type-II, and NBI cells are narrower and are illustrated with arrows. Inputs to type-II and NBI cells are centered at a frequency that is 0.8 multiplied by the best frequency (BF) of the corresponding type-IV cell. The weight of the NBI-to-O inhibitory connection (“inhibitory gain”) was the only parameter systematically varied. Full connection details can be found in the methods in Sec. II.
FIG. 3.
FIG. 3.
Model neuron responses to notched-noise sweeps. The black dashed line indicates model DCN type-IV cell response as a function of notch center frequency. The solid black line indicates model IC type-O cell response with an inhibitory gain of 1. The solid gray line indicates model IC type-O cell response with an inhibitory gain of 5. The thin vertical dashed line indicates the BF of all three model cells (11 kHz).
FIG. 4.
FIG. 4.
Model predictions in sound segregation task. (a) Heat map showing predicted proportion correct (mean across all listeners) as a function of inhibitory gain and modulation frequency in model IC type-O cells for the AM-alone task. A scale bar is present to the left of (a). (b) Averaged mean-squared error between model predicted proportion correct and the mean performance of veridical (green) and non-veridical (magenta) responders as a function of inhibitory gain. (c) Same as (a), for the AM + masker task. The upper black box indicates the gain window that best fits the mean performance of veridical responders. The lower black box indicates the gain window that best fits the mean performance of non-veridical responders. (d) Same as (b) for the AM + masker task. Also included are the data for individual listeners (dashed lines). (e) Model predicted proportion correct compared to human performance. The green and magenta solid lines indicate the mean model predicted proportion correct from the upper and lower black boxes in (c), respectively. The green and magenta dotted lines indicate the mean predicted proportion correct from the pinna-only model, for listeners with veridical and non-veridical task performance, respectively. The green and magenta stripes indicate listener performance ± standard error of a proportion for listeners with veridical and non-veridical task performance, respectively, reproduced from Fig. 1(a). (f) Model predicted proportion correct for different model temporal windows. The green and magenta solid and dashed lines indicate the mean model predicted proportion correct as the solid lines in (e), but for data corresponding to different temporal windows [heat maps and gain windows not shown, but created as in (c)]. The line styles correspond to maximum representable AM frequencies as follows: thickest line, 1000 Hz; thick line, 667 Hz; thin line, 100 Hz; wide dashed line, 67 Hz; narrow dashed line, 33 Hz. The green and magenta stripes are as in (e). For all lines in (b), (d), (e), and (f), green corresponds to veridical responders and magenta corresponds to non-veridical responders.
FIG. 5.
FIG. 5.
Model evaluation of drift hypothesis, individual listeners. Comparison of model performance under the drift hypothesis to optimal model performance. MSEs are calculated as in Fig. 4(c), but with individual data rather than averaged data. The open circles indicate veridical responders. The star symbols indicate non-veridical responders. The dashed line is a unity line.

References

    1. Aitkin, L. , and Martin, R. (1990). “ Neurons in the inferior colliculus of cats sensitive to sound-source elevation,” Hear. Res. 50, 97–106.10.1016/0378-5955(90)90036-O - DOI - PubMed
    1. Asano, F. , Suzuki, Y. , and Sone, T. (1990). “ Role of spectral cues in median plane localization,” J. Acoust. Soc. Am. 88, 159–168.10.1121/1.399963 - DOI - PubMed
    1. Bacon, S. P. , Opie, J. M. , and Montoya, D. Y. (1998). “ The effects of hearing loss and noise masking on the masking release for speech in temporally complex backgrounds,” J. Speech. Lang. Hear. Res. 41, 549–563.10.1044/jslhr.4103.549 - DOI - PubMed
    1. Best, V. , van Schaik, A. , and Carlile, S. (2004). “ Separation of concurrent broadband sound sources by human listeners,” J. Acoust. Soc. Am. 115, 324–336.10.1121/1.1632484 - DOI - PubMed
    1. Bizley, J. K. , Nodal, F. R. , Parsons, C. H. , and King, A. J. (2007). “ Role of auditory cortex in sound localization in the midsagittal plane,” J. Neurophysiol. 98, 1763–1774.10.1152/jn.00444.2007 - DOI - PMC - PubMed

Publication types

LinkOut - more resources