Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jan 14;24(2):531-41.
doi: 10.1523/JNEUROSCI.4234-03.2004.

Discrimination of voiced stop consonants based on auditory nerve discharges

Affiliations

Discrimination of voiced stop consonants based on auditory nerve discharges

Sharba Bandyopadhyay et al. J Neurosci. .

Abstract

Previous studies of the neural representation of speech assumed some form of neural code, usually discharge rate or phase locking, for the representation. In the present study, responses to five synthesized CVC_CV (e.g., /dad_da/) utterances have been examined using information-theoretic distance measures [or Kullback-Leibler (KL) distance] that are independent of a priori assumptions about the neural code. The consonants in the stimuli fall along a continuum from /b/ to /d/ and include both formant-frequency (F1, F2, and F3) transitions and onset (release) bursts. Differences in responses to pairs of stimuli, based on single-fiber auditory nerve responses at 70 and 50 dB sound pressure level, have been quantified, based on KL and KL-like distances, to show how each portion of the response contributes to information coding and the fidelity of the encoding. Distances were large at best frequencies, in which the formants differ but were largest for fibers encoding the high-frequency release bursts. Distances computed at differing time resolutions show significant information in the temporal pattern of spiking, beyond that encoded by rate, at time resolutions from 1-40 msec. Single-fiber just noticeable differences (JNDs) for F2 and F3 were computed from the data. These results show that F2 is coded with greater fidelity than F3, even among fibers tuned to F3, and that JNDs are larger in the syllable final consonant than in the releases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Spectrograms of the five stimuli S1-S5. S1 and S5 are typical of the utterances /bab_ba/ and /dad_da/ in English; S2-S4 have the same vowel, but consonant properties are interpolated between /b/ and /d/. The spectrogram of S1 is shown for the whole stimulus (/bab_ba/), but only the first syllables of S2-S5 are shown (e.g., /dad/ for S5). The first three formant trajectories are highlighted with white lines. Note the burst of high-frequency energy at the release of the /d/ in S5 (arrow), which becomes weaker approaching S1. Spectrograms were computed from the first differences of the stimuli using a 25.6 msec Hamming window.
Figure 8.
Figure 8.
Response distance declines at coarser temporal resolutions. Integrated distances during the release formant transition are plotted against the analysis time scale for different populations of fibers. The calculation of KL distance between spike count distributions was done as described in Materials and Methods, except that the binwidth (Δ) was set at 2, 4, 8, or 16 msec. The KL distance was integrated across time (0.03-0.07 sec) and across BF. Fibers were grouped in nonoverlapping 0.3 octave bins, distances were summed and normalized to the average in each bin, and then the bins were averaged, so the result plotted is average distance per fiber cumulated over 0.03-0.07 sec. A-C show cumulative distances for three BF ranges, as described in the figure titles. The sound level is indicated by the symbols, as defined in the legend.
Figure 2.
Figure 2.
Cumulative raw and debiased RA distances between responses to S1 and S5 plotted versus time. The data were obtained from the model of Zhang et al. (2001) (Bruce et al., 2003). Only responses to the first syllable (/bab/ vs /dad/) are shown. Distance is plotted cumulatively on the y-axis (i.e., the distance at time t is the sum of the distances computed in all the bins from 0 to t). A, Data from 500 repetitions of the stimuli presented at 70 dB, for a model fiber with BF = 1.2 kHz and spontaneous rate = 100/sec. The analysis conditions are identified in the legend. Raw, Raw RA distance with no bias correction; BS, bootstrap correction. Results are shown for D = 0 and 4. B, Same plot for the first 100 responses from the same simulated data, for D = 4 only. Note that the ordinate is split at the dashed line.
Figure 3.
Figure 3.
Responses to S1 and S5 and cumulative RA distance versus time for an AN fiber (BF = 1.7 kHz; spontaneous rate, 67.3/sec). A, B, PST histograms for the fiber in response to the two stimuli, computed with the same binwidth as the difference analysis (1 msec). Vertical lines demark the silences, consonants, and vowels for the full CVC_CV stimuli. C, Cumulative RA distance versus time for D = 0. The curves are identified in the legend. The dashed curves show the debiased difference ±1 SE. D. The same plots for D = 4. Now bootstrap debiasing leaves noticeable slopes (s1, s2, and s3) during the vowels and inter-syllable silence). The dotted line shows the result of additional bias subtraction sufficient to make the slopes zero; this result is similar to the debiased result in C.
Figure 4.
Figure 4.
Three-dimensional plots of the differences between stimuli S1 and S5 and the RA differences between responses to those stimuli. A, Absolute value of the decibel difference between the spectrograms of S1 and S5; spectrograms were computed from the first-differences of the stimuli using a 25.6 msec Hamming window. Because of the first-differencing, the plotted amplitudes are approximately what they would be if stimulus power were grouped into logarithmic bins, as was done with the spike data. B, Debiased incremental RA distance between responses to S1 and S5 with D = 0. The average distance in each bin (e.g., Eq. 3 converted to RA distance but not cumulated across time) is plotted against time along the x-axis and fiber BF along the y-axis. Fibers are collected according to BF into overlapping 0.25 octave bins spaced every 0.0625 octaves. A fiber with BF b contributes to a bin with center frequency f with weight [1- log2(b/f)/0.125]. The average KL distances were computed in this way and then converted to RA distances. Data are normalized by dividing by the sum of the weights in each bin, so the result is given as incremental RA distance per fiber. The number of fibers in each bin is shown in Figure 5C.
Figure 5.
Figure 5.
Three-dimensional plots of debiased RA distances for various situations. A-D, Histograms of the number of fibers in each bin of the population plots in E-H. The BF axes at left are shared by the histograms and the three-dimensional color plots. E, Distances between responses to S1 and S5 at 70 dB (D = 0) for HSR fibers (spontaneous rate >18/sec). The plot is constructed in the same way as Figure 4, except distance is plotted on a color scale, shown at the bottom right. The white lines show the first three formants of the stimuli. The stimuli differ where the lines diverge, during the consonants, and response differences are seen mainly at those times. The formants are also different during the silences (before 0.05 sec and between 0.2 and 0.3 sec), but these differences are not conveyed in the stimuli and response differences are not seen in the silences. F, Distances between responses to S1 and S5 for LMSR fibers. The results are similar to those for HSR fibers. Because there are fewer LMSR fibers, there are several bins with no data (gray bars). G, Same plot for all the fibers; these are all the fibers in E and F and are the same data as in Figure 4. H, Debiased RA distances between responses to S1 and S2 at 70 dB. Distances are smaller than for S1 and S5.
Figure 9.
Figure 9.
Debiased KL distances between responses of model fibers to the 25 stimulus set, with F2 and F3 varying independently. Distances are computed between the outer 24 stimuli and the center stimulus, for which F2 = 1.3 and F3 = 2.4 kHz. Distances were cumulated over 0.05-0.07 sec, and the x- and y-axes plot F2 and F3 formant frequency difference from the reference stimulus, at the center of the time window. A, Distances for 20 model fibers with BFs near F2 (0.8-1.8 kHz; spaced 0.06 octaves). Note that KL distance increases much more with changes in F2 than with F3. B, Distances for 20 model fibers with BFs near F3 (1.9-2.9 kHz; spaced 0.03 octaves). Here, KL distances increased with both F2 and F3. C, Distances for 20 model fibers with BFs between the upper half of the F3 region and to the low frequency end of the burst (2.4-3.5 kHz; spaced at 0.03 octave). Again, distances increased for both F2 and F3. Saturation is observed at the edges in B and C.
Figure 7.
Figure 7.
Comparison of rate and response KL distances. A, Comparison of debiased KL distances between rates and the full KL distance measure, for the responses of fibers to S1 and S5, cumulated over the time interval 0.03-0.07 sec. The abscissa shows KL distances between the histograms of the spike counts over this time interval; the ordinate shows cumulated KL distance from the same time interval, using the method of Equation 3. Response distances were computed with D = 0 and debiased with bootstrapping. KL distance for rate was also debiased with bootstrapping. LMSR and HSR fibers are shown separately, as are responses at 50 and 70 dB. The line shows where distances are equal. B, PST histograms over 0.025-0.1 sec for one fiber in response to S1 and S5 at 70 dB (thin and thick solid lines) along with the cumulative KL distance between responses (dotted line) over the same interval. The fiber is marked by the arrow in A and has BF = 4.35 kHz, SR = 46.4/sec. The dashed lines show the average rates over 0.03-0.07 sec, which are essentially the same for the two stimuli.
Figure 6.
Figure 6.
Rate differences and d′. A, Differences in spike occurrence probability, expressed as instantaneous rate differences in 1 msec bins, for responses to S1 minus S5 at 70 dB. Fibers are binned along the y-axis by BF as in previous figures. The differences are normalized to show change per fiber. The black lines show the trajectories of the first three formants. Differences are largest where the formants differ. B, d′ discriminability, computed (Eq. 6) from the difference in average rates over latencies of 0.03-0.07 sec (i.e., during the formant transition of the initial consonant). Each point is one fiber; the symbol identifies the spontaneous rate group of the fiber. The solid line shows a log-triangular smooth of the data points, as shown in Figures 4 and 5. The dashed horizontal lines show d′ = 0 and ±1.

References

    1. Blumstein SE, Stevens KN (1979) Acoustic invariance in speech production: evidence from measurements of the spectral characteristics of stop consonants. J Acoust Soc Am 66: 1001-1017. - PubMed
    1. Bruce IC, Sachs MB, Young ED (2003) An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses. J Acoust Soc Am 113: 369-388. - PubMed
    1. Carney LH, Geisler CD (1986) A temporal analysis of auditory-nerve fiber responses to spoken stop consonant-vowel syllables. J Acoust Soc Am 79: 1896-1914. - PubMed
    1. Conley RA, Keilson SE (1995) Rate representation and discriminability of second formant frequencies for /e/-like steady-state vowels in cat auditory nerve. J Acoust Soc Am 98: 3223-3234. - PubMed
    1. Cooper FS, Delattre PC, Liberman AM, Borst JM, Gerstman LJ (1952) Some experiments on the perception of synthetic speech sounds. J Acoust Soc Am 24: 597-606.

Publication types

LinkOut - more resources