. 2002 Jul;112(1):247-58.

doi: 10.1121/1.1487843.

Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses

Li Xu¹, Yuhjung Tsai, Bryan E Pfingst

Affiliations

PMID: 12141350
PMCID: PMC1414789
DOI: 10.1121/1.1487843

Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses

Li Xu et al. J Acoust Soc Am. 2002 Jul.

. 2002 Jul;112(1):247-58.

doi: 10.1121/1.1487843.

Authors

Li Xu¹, Yuhjung Tsai, Bryan E Pfingst

Affiliation

¹ Kresge Hearing Research Institute, Department of Otolaryngology, University of Michigan, Ann Arbor 48109, USA. XuL@ohio.edu

PMID: 12141350
PMCID: PMC1414789
DOI: 10.1121/1.1487843

Abstract

Tone languages differ from English in that the pitch pattern of a single-syllable word conveys lexical meaning. In the present study, dependence of tonal-speech perception on features of the stimulation was examined using an acoustic simulation of a CIS-type speech-processing strategy for cochlear prostheses. Contributions of spectral features of the speech signals were assessed by varying the number of filter bands, while contributions of temporal envelope features were assessed by varying the low-pass cutoff frequency used for extracting the amplitude envelopes. Ten normal-hearing native Mandarin Chinese speakers were tested. When the low-pass cutoff frequency was fixed at 512 Hz, consonant, vowel, and sentence recognition improved as a function of the number of channels and reached plateau at 4 to 6 channels. Subjective judgments of sound quality continued to improve as the number of channels increased to 12, the highest number tested. Tone recognition, i.e., recognition of the four Mandarin tone patterns, depended on both the number of channels and the low-pass cutoff frequency. The trade-off between the temporal and spectral cues for tone recognition indicates that temporal cues can compensate for diminished spectral cues for tone recognition and vice versa. An additional tone recognition experiment using syllables of equal duration showed a marked decrease in performance, indicating that duration cues contribute to tone recognition. A third experiment showed that recognition of processed FM patterns that mimic Mandarin tone patterns was poor when temporal envelope and duration cues were removed.

PubMed Disclaimer

Figures

**FIG. 1.**
Acoustic features of the speech and artificial signals. (A) Raw waveforms (top row) and spectrograms (bottom row) of the Chinese syllable /xu/ (pronounced “shoo”) spoken by a female. Panels from left to right show tone patterns 1 through 4. The lexical meaning associated with tones 1 through 4 are “void,” “slowly,” “permit,” and “sequence,” respectively. The darkness of the spectrograms represents the energy associated with time and frequency. The fundamental frequency and the harmonics of the voiced part (/u/) show flat, rising, falling/rising, and falling patterns for tones 1 through 4, respectively. The temporal envelopes of the waveforms also differ from one tone pattern to another. The durations of the syllables are about 0.6 s with the voiced part averaged around 0.4 s. (B) Spectrograms of the higher-pitched set of the frequency-modulated (FM) sweeps synthesized to mimic the four tone patterns of Mandarin Chinese. The fundamental frequencies are listed in Table I. The durations are constant at 0.5 s.

F<sc>IG</sc>. 2. — **FIG. 2.**
Tone, consonant, vowel, and sentence recognition as a function of number of channels. The upper panels plot the distribution of the percent-correct scores across all subjects in a boxplot format in which the three horizontal lines represent the 25th, 50th, and 75th percentiles and the ends of the vertical bars represent the minimum and maximum. Panels from left to right are for tone, consonant, vowel, and sentence recognition, respectively. The dashed line at 25% for tone, consonant, and vowel recognition indicates the chance performance. For sentences, chance performance was 0%. The number of subjects tested is indicated in the lower right corner of each panel. The lower panels show the statistical significance of pairwise comparison of the mean percent correct associated with number of channels as revealed by the Tukey test. The light- and dark-gray squares represent the significance levels at p<0.05 and p<0.01, respectively. The empty squares represent comparisons not statistically significant.

F<sc>IG</sc>. 3. — **FIG. 3.**
Subjective judgments of the sound quality as a function of number of channels. The subjective judgments of each subject were normalized to his or her highest judgment score across all tests. The boxplot shows the distribution of the mean normalized quality judgments of all nine subjects. In the boxplot, the three horizontal lines represent the 25th, 50th, and 75th percentiles, and the ends of the vertical line show the minimum and the maximum of the distribution.

F<sc>IG</sc>. 4. — **FIG. 4.**
Relationship between the subjective judgments of sound quality and the tone, consonant, vowel, and sentence recognition scores. Each dot represents percent correct from one speech test (ordinate) and the normalized quality judgment (abscissa). The correlation coefficients (r) are shown in the lower right corners of all panels.

F<sc>IG</sc>. 5. — **FIG. 5.**
Distribution of tone recognition scores as a function of the low-pass cutoff frequency. In the boxplot, the three horizontal lines represent the 25th, 50th, and 75th percentiles and the ends of the vertical line show the minimum and the maximum of the distribution across all nine subjects. The group means are connected by the solid lines. The dashed line represents the chance performance at 25%. The upper and lower traces represent data obtained with 12 channels and 1 channel, respectively, as indicated by the labels.

F<sc>IG</sc>. 6. — **FIG. 6.**
Representation of the number-of-channels-versus-LPFs matrix of tone recognition scores of three individual subjects. Each panel shows the mean percent correct for tones from one subject. (A) Subject 3. (B) Subject 4. (C) Subject 5. For each panel, the abscissa and the ordinate represent the number of channels and the LPFs, respectively. The percent correct for tones, which ranged between 25% to 100%, is represented by the diameter of the filled circles as indicated by the scale bar at the top.

F<sc>IG</sc>. 7. — **FIG. 7.**
Representation of the pooled results for the number-of-channels-versus-LPFs matrix of tone recognition scores. The data are plotted in the contour format in which the percent correct is represented by the gray scale as indicated by the scale bar at the top. The abscissa and ordinate are both in logarithmic scales. (A) Data represent the average across all nine subjects who participated in the tone recognition tests using speech materials in which the syllable duration was not equalized. (B) Data represent the average across all four subjects who participated in the tone recognition tests using speech materials that had equal syllable duration. In both (A) and (B), a trade-off between the number of channels and the LPFs is evident by the gradient of the tone recognition scores along the main diagonal line.

F<sc>IG</sc>. 8. — **FIG. 8.**
Syllable durations of tone 1 through 4. Each symbol represents duration of one syllable spoken either by a male voice (open square) or by a female voice (filled circles). The rightmost column, labeled “equal,” plots the durations of the syllables that were selected for equal durations for tones 1 through 4.

F<sc>IG</sc>. 9. — **FIG. 9.**
Mean recognition scores of the four-pattern FM sweeps across all four subjects. The data are plotted in the same format as in one of the panels in Fig. 7 except that the contours are plotted in coarser steps than they are in Fig. 7. Panels (A), (B), and (C) show the percent correct for the lower-, higher-, and both lower- and higher-pitched FM sweeps, respectively. The fundamental frequencies of the FM sweeps are listed in Table I.

See this image and copyright information in PMC

Cited by

Relative Contributions of Spectral and Temporal Cues to Korean Phoneme Recognition.
Kim BJ, Chang SA, Yang J, Oh SH, Xu L. Kim BJ, et al. PLoS One. 2015 Jul 10;10(7):e0131807. doi: 10.1371/journal.pone.0131807. eCollection 2015. PLoS One. 2015. PMID: 26162017 Free PMC article.
Across-site patterns of modulation detection: relation to speech recognition.
Garadat SN, Zwolan TA, Pfingst BE. Garadat SN, et al. J Acoust Soc Am. 2012 May;131(5):4030-41. doi: 10.1121/1.3701879. J Acoust Soc Am. 2012. PMID: 22559376 Free PMC article. Clinical Trial.
Cantonese Tone Identification in Three Temporal Cues in Quiet, Speech-Shaped Noise and Two-Talker Babble.
Wong P, Cheng ST, Chen F. Wong P, et al. Front Psychol. 2018 Oct 9;9:1604. doi: 10.3389/fpsyg.2018.01604. eCollection 2018. Front Psychol. 2018. PMID: 30356874 Free PMC article.
Age-Related Changes in Voice Emotion Recognition by Postlingually Deafened Listeners With Cochlear Implants.
Cannon SA, Chatterjee M. Cannon SA, et al. Ear Hear. 2022 Mar/Apr;43(2):323-334. doi: 10.1097/AUD.0000000000001095. Ear Hear. 2022. PMID: 34406157 Free PMC article.
Speech recognition and temporal amplitude modulation processing by Mandarin-speaking cochlear implant users.
Luo X, Fu QJ, Wei CG, Cao KL. Luo X, et al. Ear Hear. 2008 Dec;29(6):957-70. doi: 10.1097/AUD.0b013e3181888f61. Ear Hear. 2008. PMID: 18818548 Free PMC article.

See all "Cited by" articles

References

1. Cochlear Corporation . Lane Cove. Cochlear Limited; Australia: 1999. Nucleus Technical Reference Manual Z43470 Issue 1.
1. Dorman MF, Loizou PC. “The identification of consonants and vowels by cochlear implant patients using a 6-channel continuous interleaved sampling processor and by normal-hearing subjects using simulations of processors with two to nine channels,”. Ear Hear. 1998;19:162–166. - PubMed
1. Dorman MF, Loizou PC, Fitzke J, Tu Z. “The recognition of sentences in noise by normal-hearing listeners using simulations of cochlear-implant signal processors with 6-20 channels,”. J. Acoust. Soc. Am. 1998;104:3583–3596. - PubMed
1. Dorman MF, Loizou PC, Rainey D. “Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise outputs,”. J. Acoust. Soc. Am. 1997a;102:2403–2411. - PubMed
1. Dorman MF, Loizou PC, Rainey D. “Simulating the effect of cochlear-implant electrode insertion depth on speech understanding,”. J. Acoust. Soc. Am. 1997b;102:2993–2996. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

T32 DC000011/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses

Affiliation

Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical