Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Jan 6;30(1):194-204.
doi: 10.1523/JNEUROSCI.2982-09.2010.

Pinna cues determine orienting response modes to synchronous sounds in elevation

Affiliations
Comparative Study

Pinna cues determine orienting response modes to synchronous sounds in elevation

Peter Bremen et al. J Neurosci. .

Abstract

To program a goal-directed orienting response toward a sound source embedded in an acoustic scene, the audiomotor system should detect and select the target against a background. Here, we focus on whether the system can segregate synchronous sounds in the midsagittal plane (elevation), a task requiring the auditory system to dissociate the pinna-induced spectral localization cues. Human listeners made rapid head-orienting responses toward either a single sound source (broadband buzzer or Gaussian noise) or toward two simultaneously presented sounds (buzzer and noise) at a wide variety of locations in the midsagittal plane. In the latter case, listeners had to orient to the buzzer (target) and ignore the noise (nontarget). In the single-sound condition, localization was accurate. However, in the double-sound condition, response endpoints depended on relative sound level and spatial disparity. The loudest sound dominated the responses, regardless of whether it was the target or the nontarget. When the sounds had about equal intensities and their spatial disparity was sufficiently small, endpoint distributions were well described by weighted averaging. However, when spatial disparities exceeded approximately 45 degrees, response endpoint distributions became bimodal. Similar response behavior has been reported for visuomotor experiments, for which averaging and bimodal endpoint distributions are thought to arise from neural interactions within retinotopically organized visuomotor maps. We show, however, that the auditory-evoked responses can be well explained by the idiosyncratic acoustics of the pinnae. Hence basic principles of target representation and selection for audition and vision appear to differ profoundly.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Two competing accounts for perceiving a phantom sound source at a weighted-average location in the midsagittal plane. Top, In the peripheral-interaction model, the double-sound spectra of BZZ (B) and GWN (G) yield an amplitude spectrum corresponding to the weighted-averaged location (symbolized by mixed circle). The interactions of the two sound sources take place at the level of the pinna, and their identities are lost in the CNS. Bottom, In the neural-interaction scheme, the periphery preserves spectral shape of either source (blue and red circle at the pinna stage). A weighted-averaged percept emerges through neural interactions in an audiospatial representation (mixed circle at the central auditory system stage). Both models predict the same behavior, yet they are mutually exclusive.
Figure 2.
Figure 2.
Standard localization behavior of listener P.B. to single-speaker sounds at five levels (BZZ: 32, 37, 42, 47, 52 dBA; GWN: 35, 40, 45, 50, 55 dBA). A, B, Stimulus–response plots for BZZ (A) and GWN (B). The different color shades indicate different levels. Gains and correlation coefficients are close to 1, and biases close to 0°, indicating good localization performance. The dashed lines indicate linear regression lines. C, Gains for all single speaker sounds. Subscript numbers indicate the level of BZZ (1) and GWN (11) and the summed sounds (for details, see Materials and Methods). The lines in different shades of grays are from different listeners. The thick black line with gray circles is the average over all four listeners. Error bars denote 1 SD. D, Correlation coefficients obtained for single-sound stimulus–response plots. Both gains and correlations are close to 1, indicating high localization accuracy and precision, respectively.
Figure 3.
Figure 3.
Localization behavior of listener DB in double-speaker trials. Each column shows stimulus–response data for one level difference between BZZ and GWN. Top row, Responses plotted against BZZ location. Bottom row, Responses plotted versus GWN location.
Figure 4.
Figure 4.
Localization behavior of listener DB in double-speaker trials as a function of a weighted-average target prediction (Eq. 3) of BZZ and GWN location for five level differences.
Figure 5.
Figure 5.
A, Partial correlation coefficients for BZZ location on localization response obtained with regression analysis of Equation 3. The weight of the GWN is 1 − wb. The thin lines indicate data from individual listeners; the thick lines with markers show pooled data. The influence of a target depends in a sigmoid manner on the level difference. Note that the point of equal contribution (weight = 0.5) is at ΔL = 0 dBA. B, Correlation coefficients obtained from the linear regression shown in Figure 2. The color convention is as in A. The coefficients for the weighted-average prediction are depicted in gray colors. For BZZ and GWN stimuli, the correlation decreases with decreasing level of the corresponding single target. The correlation coefficient of the weighted-average prediction is equal to the single target values at extreme level differences. At ΔL of −3 and +2 dBA, however, the weighted-average prediction correlates better with the responses than either BZZ or GWN. C, Partial correlation coefficients of Equation 4 for all four listeners (M.W., P.B., D.B., R.M.). Most of the data can be explained by the weighted-averaged prediction, with coefficients >0.8 for all four listeners.
Figure 6.
Figure 6.
Linear regression gains (Eq. 2) for BZZ (blue), GWN (red), and weighted-averaged prediction (gray) as a function of Δε pooled across all four listeners. Data at Δε = 105° are averaged over Δε = (90, 105, and 120°). Error bars denote 1 SD.
Figure 7.
Figure 7.
Normalized saccade-endpoint distributions for the five ΔL values (rows), pooled over all listeners. Data are separated in Δε ≤ 45° (left column) and Δε > 45° (right column). The black dotted lines indicate normalized target locations, with 1.0 denoting the BZZ and −1.0 denoting GWN. The red line indicates a simulated weighted-averaged target prediction (μ = εAVG, σ = 12°, N = 104). The blue line shows a simulated bimodal prediction (μ1 = wBZZ·εBZZ, μ2 = wGWN·εGWN, σ = 12°, N = 104).
Figure 8.
Figure 8.
Normalized head-saccade endpoints as a function of saccade latency for the five ΔL values (rows), pooled over all listeners. Data are separated in conditions in which Δε ≤ 45° (left column) and Δε > 45° (right column). BZZ and GWN locations are indicated by the dotted black lines at 1 and −1, respectively. The thick gray line indicates the running average.
Figure 9.
Figure 9.
SI of single-sound DTFs and Schroeder-DTF templates (colored patches) and normalized single-sound head-movement responses (white dots) for BZZ (listener D.B.; left) and GWN (listener R.M.; center) as a function of target location. Responses and areas of highest similarity (warm colors) between template and target DTF coincide with response location. Right, Normalized histograms of the SIs obtained at the response locations pooled across all 11 single sounds for all four listeners (different shades of gray). The thick line with markers indicates average across listeners.
Figure 10.
Figure 10.
SI of double-sound DTFs and Schroeder-DTF templates (colored patches) as a function of GWN location, and the template location for all five ΔL values and four different listeners (top row). The listener's responses are indicated as white dots. In each plot, the buzzer location was held constant at the location indicated by the solid black line. The dashed-black line indicates the GWN location (unity), and the dotted-black line indicates a prediction based on the weighted average of buzzer and GWN location. The warm colors indicate small differences between template and simulated double-DTF, and cold colors indicate large differences. Bottom row, Normalized histograms of SI obtained at response locations for all ΔL values pooled across Δε conditions for all listeners (different shades of gray). The thick line with markers denotes average across listeners.

Similar articles

Cited by

References

    1. Aitsebaomo AP, Bedell HE. Saccadic and psychophysical discrimination of double targets. Optom Vis Sci. 2000;77:321–330. - PubMed
    1. Algazi R, Duda RO, Thompson DM, Avendano C. The CIPIC HRTF database. Paper presented at 2001 IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics; October; New Paltz, NY. 2001.
    1. Arai K, McPeek RM, Keller EL. Properties of saccadic responses in monkey when multiple competing visual stimuli are present. J Neurophysiol. 2004;91:890–900. - PubMed
    1. Becker W, Jürgens R. An analysis of the saccadic system by means of double step stimuli. Vision Res. 1979;19:967–983. - PubMed
    1. Best V, van Schaik A, Carlile S. Separation of concurrent broadband sound sources by human listeners. J Acoust Soc Am. 2004;115:324–336. - PubMed

Publication types

LinkOut - more resources