Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar;23(3):670-83.
doi: 10.1093/cercor/bhs045. Epub 2012 Mar 16.

Different timescales for the neural coding of consonant and vowel sounds

Affiliations

Different timescales for the neural coding of consonant and vowel sounds

Claudia A Perez et al. Cereb Cortex. 2013 Mar.

Abstract

Psychophysical, clinical, and imaging evidence suggests that consonant and vowel sounds have distinct neural representations. This study tests the hypothesis that consonant and vowel sounds are represented on different timescales within the same population of neurons by comparing behavioral discrimination with neural discrimination based on activity recorded in rat inferior colliculus and primary auditory cortex. Performance on 9 vowel discrimination tasks was highly correlated with neural discrimination based on spike count and was not correlated when spike timing was preserved. In contrast, performance on 11 consonant discrimination tasks was highly correlated with neural discrimination when spike timing was preserved and not when spike timing was eliminated. These results suggest that in the early stages of auditory processing, spike count encodes vowel sounds and spike timing encodes consonant sounds. These distinct coding strategies likely contribute to the robust nature of speech sound representations and may help explain some aspects of developmental and acquired speech processing disorders.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Spectrograms of the 5 vowel sounds with 2 initial consonants. The initial recordings were shifted one octave higher to match the rat hearing range using the STRAIGHT vocoder (Kawahara 1997).
Figure 2.
Figure 2.
Vowel sound discrimination by rats. Performance was evaluated using a go/no-go discrimination task requiring rats to lever press to the word “dad” or “sad.” while ignoring words with other vowel sounds. (a) During the first stage of training, half of the rats learned to discriminate between words beginning with “d” (i.e., “dad” vs. “dead,” “dud,” “deed,” or “dood”) and the other half learned to discriminate vowel sounds of words beginning with “s.” (b) During the second stage of training, the stimuli presented to the 2 groups of rats were switched. Performance was significantly above chance on the first day of performance (P < 0.05). (c) During the third stage of training, both groups were required to lever press to either “dad” or “sad,” while ignoring the 8 words with other vowel sounds. Open circles indicate performance on a modified set of stimuli in which the “s” sound was replaced with a noise burst to prevent discrimination based on coarticulation cues in the initial consonant sounds. Discrimination of these sounds was significantly above chance.
Figure 3.
Figure 3.
Lever press rates for each of the 5 vowel sounds tested. (a) Peak frequency for the first and second formant of each sound tested is plotted in Hertz. Note that original speech sounds were shifted one octave higher to match the rat hearing range using the STRAIGHT vocoder (Kawahara 1997). (b) The relationship between the Euclidean distance in F1–F2 space in octaves is well correlated with behavioral discrimination. (c and d) Rats pressed the lever in response to the target vowel/æ/(as in “sad” and “dad”) significantly more than for any of the other vowel sounds. The solid line indicates the frequency that rats pressed the lever during silent catch trials. The dashed lines and the error bars represent standard error of the mean. Behavioral responses were from all 10 days of training.
Figure 4.
Figure 4.
Response of the entire population of IC neurons to each of the vowel sounds tested. Neurograms are constructed from the average PSTH of 187 IC recording sites ordered by the characteristic frequency of each recording site. The average PSTH for all the sites recorded is shown above each neurogram. The peak firing rate for the population PSTH was 550 Hz. To illustrate how the spatial activity patterns differed across each vowel, subplots to the right of each neurogram show the difference between the response to each sound and the mean response to all 5 sounds. A white line is provided at zero to make it clearer which responses are above or below zero. Compared with the average response, the words “seed” and “deed” evoked more activity among high frequency sites (>9 kHz) and less activity among low frequency sites (<6 kHz), which was expected from their power spectrums (Fig. 7).
Figure 5.
Figure 5.
Response of the entire population of A1 neurons recorded to each of the vowel sounds tested. The conventions are the same as Figure 4, except that the height of the PSTH scale bar represents a firing rate of 250 Hz.
Figure 6.
Figure 6.
The distinctness of the spatial patterns evoked in IC (a) and A1 (b) by each vowel pair was correlated with the distance between the vowels in the feature space formed by the first and second formant peaks. The distinctness of the spatial patterns evoked in IC (c) and A1 (d) was also correlated with behavior discrimination of the vowel pairs. In this figure, neural distinctness was quantified as the absolute value of the difference in the average firing rate recorded in response to each pair of vowel sounds. A 300 ms long analysis window beginning at vowel onset was used to quantify the vowel response.
Figure 7.
Figure 7.
Power spectrums of the 5 vowel sounds with 2 initial consonants. The initial recordings were shifted one octave higher using the STRAIGHT vocoder to better match the rat hearing range (Kawahara 1997).
Figure 8.
Figure 8.
Neural responses to speech sounds and classifier based discrimination of each sound from “dad.” (a and c) Dot rasters show the timing of action potentials evoked in response to speech sounds differing in the vowel or initial consonant sound. Responses from 20 individual trials are shown. The acoustic waveform is shown in gray. The performance of a PSTH based neural classifier is shown as percent correct discrimination from “dad.” The classifier compares the pattern of spike timing evoked on each trial with the average PSTHs generated by 2 sounds (i.e., “dead” and “dad”) and labels each sound based on which of the Euclidean distances between the single trial pattern and the average PSTHs is smaller. (b and d) Plots the number of spikes evoked by 20 presentations of each word. The classifier compares the spike count evoked each trial with the average spike count generated by 2 sounds (i.e., “dead” and “dad”) and labels each sound based on which of the difference between the single trial spike count and the average spike counts.
Figure 9.
Figure 9.
Neural discrimination was well correlated with behavioral discrimination of vowels when the average firing rate over a long analysis window was used but was not significantly correlated with behavior when spike timing was considered. Neural discrimination was based on single trial multiunit activity recorded from individual sites in IC (a and b) and A1 (c and d). The nearest-neighbor classifier assigns each sweep to the stimulus that evokes the most similar response on average. In a and c, the classifier used a 300 ms long analysis window beginning at vowel onset and was used to quantify the vowel response. In b and d, the classifier compared the temporal pattern recorded over the same window and binned with 1 ms precision. These results suggest that vowel sounds are represented by the average spatial activity pattern and that spike timing information is not used.
Figure 10.
Figure 10.
Response of the entire population of IC neurons recorded to each of the consonant sounds tested. Neurograms are constructed from the average PSTH of 187 IC recording sites ordered by the preferred frequency of each recording site. The average PSTH for all the sites recorded is shown above each neurogram. The height of the scale bar to the left of each average PSTH represents a firing rate of 600 Hz. Only the initial onset response to each consonant sound is shown to allow the relative differences in spike timing to be visible. For example, high frequency neurons respond earlier to “dad” compared with “bad.” The opposite timing occurs for low frequency neurons. These patterns are similar to A1 responses (Engineer et al. 2008).
Figure 11.
Figure 11.
Neural discrimination was well correlated with behavioral discrimination of consonants when spike timing information was used but was not as well correlated with behavior when the average firing rate over a long analysis window was used. Neural discrimination was based on single trial multiunit activity recorded from individual sites in IC and A1. The nearest-neighbor classifier assigns each sweep to the stimulus that evokes the most similar response on average. In (a), the classifier compared the temporal pattern recorded within 40 ms of consonant onset binned with 1 ms precision. In (b), the classifier used the average firing rate over this same period (i.e., spike timing information was eliminated). These results suggest that spike timing plays an important role in the representation of consonant sounds.
Figure 12.
Figure 12.
Consonant discrimination was well correlated with neural discrimination when spike timing precise to 1–20 ms was provided. Vowel discrimination was well correlated with neural discrimination when spike timing information was eliminated and bin size was increased to 100–300 ms. The default analysis window for consonant sounds was 40 ms long beginning at sound onset and was increased as needed when the bin size was greater than 40 ms. The default analysis window for vowel sounds was 300 ms long beginning at vowel onset and was only extended when the 400 ms bin was used. Vowel discrimination was not correlated with neural discrimination when a 40 ms analysis window was used beginning at vowel onset regardless of the bin size used (data not shown). Neural discrimination was based on single trial multiunit activity recorded from individual sites in IC. These results suggest that spike timing plays an important role in the representation of consonant sounds but not vowel sounds. Asterisks indicate statistically significant correlations (P < 0.05).

References

    1. Anderson SE, Kilgard MP, Sloan AM, Rennaker RL. Response to broadband repetitive stimuli in auditory cortex of the unanesthetized rat. Hear Res. 2006;213:107–117. - PubMed
    1. Binder JR, Frost JA, Hammeke TA, Bellgowan PS, Springer JA, Kaufman JN, Possing ET. Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex. 2000;10:512–528. - PubMed
    1. Boatman D, Hall C, Goldstein MH, Lesser R, Gordon B. Neuroperceptual differences in consonant and vowel discrimination: as revealed by direct cortical electrical interference. Cortex. 1997;33:83–98. - PubMed
    1. Boatman D, Lesser R, Gordon B. Auditory speech processing in the left temporal lobe: an electrical interference study. Brain Lang. 1995;51:269–290. - PubMed
    1. Boatman D, Lesser R, Hall C, Gordon B. Auditory perception of segmental features: a functional neuroanatomic study. J Neurolinguistics. 1994;8:225–234.

Publication types