Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Jul;112(1):247-58.
doi: 10.1121/1.1487843.

Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses

Affiliations

Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses

Li Xu et al. J Acoust Soc Am. 2002 Jul.

Abstract

Tone languages differ from English in that the pitch pattern of a single-syllable word conveys lexical meaning. In the present study, dependence of tonal-speech perception on features of the stimulation was examined using an acoustic simulation of a CIS-type speech-processing strategy for cochlear prostheses. Contributions of spectral features of the speech signals were assessed by varying the number of filter bands, while contributions of temporal envelope features were assessed by varying the low-pass cutoff frequency used for extracting the amplitude envelopes. Ten normal-hearing native Mandarin Chinese speakers were tested. When the low-pass cutoff frequency was fixed at 512 Hz, consonant, vowel, and sentence recognition improved as a function of the number of channels and reached plateau at 4 to 6 channels. Subjective judgments of sound quality continued to improve as the number of channels increased to 12, the highest number tested. Tone recognition, i.e., recognition of the four Mandarin tone patterns, depended on both the number of channels and the low-pass cutoff frequency. The trade-off between the temporal and spectral cues for tone recognition indicates that temporal cues can compensate for diminished spectral cues for tone recognition and vice versa. An additional tone recognition experiment using syllables of equal duration showed a marked decrease in performance, indicating that duration cues contribute to tone recognition. A third experiment showed that recognition of processed FM patterns that mimic Mandarin tone patterns was poor when temporal envelope and duration cues were removed.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Acoustic features of the speech and artificial signals. (A) Raw waveforms (top row) and spectrograms (bottom row) of the Chinese syllable /xu/ (pronounced “shoo”) spoken by a female. Panels from left to right show tone patterns 1 through 4. The lexical meaning associated with tones 1 through 4 are “void,” “slowly,” “permit,” and “sequence,” respectively. The darkness of the spectrograms represents the energy associated with time and frequency. The fundamental frequency and the harmonics of the voiced part (/u/) show flat, rising, falling/rising, and falling patterns for tones 1 through 4, respectively. The temporal envelopes of the waveforms also differ from one tone pattern to another. The durations of the syllables are about 0.6 s with the voiced part averaged around 0.4 s. (B) Spectrograms of the higher-pitched set of the frequency-modulated (FM) sweeps synthesized to mimic the four tone patterns of Mandarin Chinese. The fundamental frequencies are listed in Table I. The durations are constant at 0.5 s.
F<sc>IG</sc>. 2.
FIG. 2.
Tone, consonant, vowel, and sentence recognition as a function of number of channels. The upper panels plot the distribution of the percent-correct scores across all subjects in a boxplot format in which the three horizontal lines represent the 25th, 50th, and 75th percentiles and the ends of the vertical bars represent the minimum and maximum. Panels from left to right are for tone, consonant, vowel, and sentence recognition, respectively. The dashed line at 25% for tone, consonant, and vowel recognition indicates the chance performance. For sentences, chance performance was 0%. The number of subjects tested is indicated in the lower right corner of each panel. The lower panels show the statistical significance of pairwise comparison of the mean percent correct associated with number of channels as revealed by the Tukey test. The light- and dark-gray squares represent the significance levels at p<0.05 and p<0.01, respectively. The empty squares represent comparisons not statistically significant.
F<sc>IG</sc>. 3.
FIG. 3.
Subjective judgments of the sound quality as a function of number of channels. The subjective judgments of each subject were normalized to his or her highest judgment score across all tests. The boxplot shows the distribution of the mean normalized quality judgments of all nine subjects. In the boxplot, the three horizontal lines represent the 25th, 50th, and 75th percentiles, and the ends of the vertical line show the minimum and the maximum of the distribution.
F<sc>IG</sc>. 4.
FIG. 4.
Relationship between the subjective judgments of sound quality and the tone, consonant, vowel, and sentence recognition scores. Each dot represents percent correct from one speech test (ordinate) and the normalized quality judgment (abscissa). The correlation coefficients (r) are shown in the lower right corners of all panels.
F<sc>IG</sc>. 5.
FIG. 5.
Distribution of tone recognition scores as a function of the low-pass cutoff frequency. In the boxplot, the three horizontal lines represent the 25th, 50th, and 75th percentiles and the ends of the vertical line show the minimum and the maximum of the distribution across all nine subjects. The group means are connected by the solid lines. The dashed line represents the chance performance at 25%. The upper and lower traces represent data obtained with 12 channels and 1 channel, respectively, as indicated by the labels.
F<sc>IG</sc>. 6.
FIG. 6.
Representation of the number-of-channels-versus-LPFs matrix of tone recognition scores of three individual subjects. Each panel shows the mean percent correct for tones from one subject. (A) Subject 3. (B) Subject 4. (C) Subject 5. For each panel, the abscissa and the ordinate represent the number of channels and the LPFs, respectively. The percent correct for tones, which ranged between 25% to 100%, is represented by the diameter of the filled circles as indicated by the scale bar at the top.
F<sc>IG</sc>. 7.
FIG. 7.
Representation of the pooled results for the number-of-channels-versus-LPFs matrix of tone recognition scores. The data are plotted in the contour format in which the percent correct is represented by the gray scale as indicated by the scale bar at the top. The abscissa and ordinate are both in logarithmic scales. (A) Data represent the average across all nine subjects who participated in the tone recognition tests using speech materials in which the syllable duration was not equalized. (B) Data represent the average across all four subjects who participated in the tone recognition tests using speech materials that had equal syllable duration. In both (A) and (B), a trade-off between the number of channels and the LPFs is evident by the gradient of the tone recognition scores along the main diagonal line.
F<sc>IG</sc>. 8.
FIG. 8.
Syllable durations of tone 1 through 4. Each symbol represents duration of one syllable spoken either by a male voice (open square) or by a female voice (filled circles). The rightmost column, labeled “equal,” plots the durations of the syllables that were selected for equal durations for tones 1 through 4.
F<sc>IG</sc>. 9.
FIG. 9.
Mean recognition scores of the four-pattern FM sweeps across all four subjects. The data are plotted in the same format as in one of the panels in Fig. 7 except that the contours are plotted in coarser steps than they are in Fig. 7. Panels (A), (B), and (C) show the percent correct for the lower-, higher-, and both lower- and higher-pitched FM sweeps, respectively. The fundamental frequencies of the FM sweeps are listed in Table I.

Similar articles

Cited by

References

    1. Cochlear Corporation . Lane Cove. Cochlear Limited; Australia: 1999. Nucleus Technical Reference Manual Z43470 Issue 1.
    1. Dorman MF, Loizou PC. “The identification of consonants and vowels by cochlear implant patients using a 6-channel continuous interleaved sampling processor and by normal-hearing subjects using simulations of processors with two to nine channels,”. Ear Hear. 1998;19:162–166. - PubMed
    1. Dorman MF, Loizou PC, Fitzke J, Tu Z. “The recognition of sentences in noise by normal-hearing listeners using simulations of cochlear-implant signal processors with 6-20 channels,”. J. Acoust. Soc. Am. 1998;104:3583–3596. - PubMed
    1. Dorman MF, Loizou PC, Rainey D. “Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise outputs,”. J. Acoust. Soc. Am. 1997a;102:2403–2411. - PubMed
    1. Dorman MF, Loizou PC, Rainey D. “Simulating the effect of cochlear-implant electrode insertion depth on speech understanding,”. J. Acoust. Soc. Am. 1997b;102:2993–2996. - PubMed

Publication types