Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec 16:7:865.
doi: 10.3389/fnhum.2013.00865. eCollection 2013.

Using auditory classification images for the identification of fine acoustic cues used in speech perception

Affiliations

Using auditory classification images for the identification of fine acoustic cues used in speech perception

Léo Varnet et al. Front Hum Neurosci. .

Abstract

An essential step in understanding the processes underlying the general mechanism of perceptual categorization is to identify which portions of a physical stimulation modulate the behavior of our perceptual system. More specifically, in the context of speech comprehension, it is still a major open challenge to understand which information is used to categorize a speech stimulus as one phoneme or another, the auditory primitives relevant for the categorical perception of speech being still unknown. Here we propose to adapt a method relying on a Generalized Linear Model with smoothness priors, already used in the visual domain for the estimation of so-called classification images, to auditory experiments. This statistical model offers a rigorous framework for dealing with non-Gaussian noise, as it is often the case in the auditory modality, and limits the amount of noise in the estimated template by enforcing smoother solutions. By applying this technique to a specific two-alternative forced choice experiment between stimuli "aba" and "ada" in noise with an adaptive SNR, we confirm that the second formantic transition is key for classifying phonemes into /b/ or /d/ in noise, and that its estimation by the auditory system is a relative measurement across spectral bands and in relation to the perceived height of the second formant in the preceding syllable. Through this example, we show how the GLM with smoothness priors approach can be applied to the identification of fine functional acoustic cues in speech perception. Finally we discuss some assumptions of the model in the specific case of speech perception.

Keywords: GLM; acoustic cues; classification images; phoneme recognition; phonetics; speech perception.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Spectrograms of target-signals t0 (/aba/) and t1 (/ada/) used for the vectorized spectrograms T0 and T1, on a logarithmic scale (dB). Blue boxes indicate the second formantic transition (F2).
Figure 2
Figure 2
(A) Evolution of SNR across trials (mean SNR by blocks of 1000 trials) for each participant, and overall mean SNR (red dotted line). (B) Psychometric function of each participant: detectability index d' (defined as d' = Φ-1 (PH)- Φ-1(PFA) as a function of signal contrast (values calculated on less than 20 observations are not included).
Figure 3
Figure 3
Prediction accuracy of the model (in terms of 10-fold cross-validation rate) as a function of regularization parameters λ1 (x-axis) and λ2 (y-axis) in logarithmic scale, for one participant (MH). Around are shown classification images obtained with different pairs of regularization parameters (λ1, λ2) (n = 10000 trials for each estimate).
Figure 4
Figure 4
(A) Classification Image β^ for each participants, estimated with optimal smoothness hyperparameters λ1 and λ2 (n = 10000 trials for each estimate). Weights are divided by their maximum absolute values. Boxes corresponds to the position of the second formantic transition (F2) in the original stimuli spectrograms. (B) Same as above except that non-significant weights are shown in gray scale (p < 0.005, permutation test).
Figure 5
Figure 5
(A) Difference template w used by the template matcher (difference between spectrograms of the targets). (B) Estimated model parameters for the template-matcher optimal hyperparameters λ1 and λ2 (n = 10000 trials). Weights are divided by their maximum absolute values.
Figure 6
Figure 6
Correlation between coefficients of the Classification Images estimated on n trials and the “overall” Classification Image, for participant MH. Examples of Classification Images are shown at 3000, 6000, and 10,000 trials.
Figure 7
Figure 7
Classification Images β^0 and β^1 estimated on the trials where t0 (/aba/) or t1 (/ada/) was presented respectively (n = 5000 trials for each estimate). Hyperparameters values are the same as for the “overall” Classification Images Figure 4. Weights are divided by their maximum absolute values.
Figure 8
Figure 8
Classification Images β^ for conditions lowest SNR (min to median SNR) or highest SNR (median to max SNR), estimated using GLM approach with smoothness priors (n = 5000 trials for each estimate). Hyperparameters values are the same as for the “overall” Classification Images Figure 4. Weights are divided by their maximum absolute values.

Similar articles

Cited by

References

    1. Abbey C. K., Eckstein M. P. (2002). Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments. J. Vis. 2, 66–78 10.1167/2.1.5 - DOI - PubMed
    1. Abbey C. K., Eckstein M. P. (2006). Classification images for detection, contrast discrimination, and identification tasks with a common ideal observer. J. Vis. 6, 335–355 10.1167/6.4.4 - DOI - PubMed
    1. Ahumada A. J. (1996). Perceptual classification images from vernier acuity masked by noise. Perception 25(ECVP Suppl.), 18.
    1. Ahumada A. J. (2002). Classification image weights and internal noise level estimation. J. Vis. 2, 121–131 10.1167/2.1.8 - DOI - PubMed
    1. Ahumada A., Lovell J. (1971). Stimulus features in signal detection. J. Acoust. Soc. Am. 49, 1751–1756 10.1121/1.1912577 - DOI