Effect of spectral normalization on different talker speech recognition by cochlear implant users

Chuping Liu¹, John Galvin 3rd, Qian-Jie Fu, Shrikanth S Narayanan

Affiliations

PMID: 18529199
PMCID: PMC2676177
DOI: 10.1121/1.2897047

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Chuping Liu et al. J Acoust Soc Am. 2008 May.

. 2008 May;123(5):2836-47.

doi: 10.1121/1.2897047.

Authors

Chuping Liu¹, John Galvin 3rd, Qian-Jie Fu, Shrikanth S Narayanan

Affiliation

¹ Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA. chupingl@usc.edu

PMID: 18529199
PMCID: PMC2676177
DOI: 10.1121/1.2897047

Abstract

In cochlear implants (CIs), different talkers often produce different levels of speech understanding because of the spectrally distorted speech patterns provided by the implant device. A spectral normalization approach was used to transform the spectral characteristics of one talker to those of another talker. In Experiment 1, speech recognition with two talkers was measured in CI users, with and without spectral normalization. Results showed that the spectral normalization algorithm had small but significant effect on performance. In Experiment 2, the effects of spectral normalization were measured in CI users and normal-hearing (NH) subjects; a pitch-stretching technique was used to simulate six talkers with different fundamental frequencies and vocal tract configurations. NH baseline performance was nearly perfect with these pitch-shift transformations. For CI subjects, while there was considerable intersubject variability in performance with the different pitch-shift transformations, spectral normalization significantly improved the intelligibility of these simulated talkers. The results from Experiments 1 and 2 demonstrate that spectral normalization toward more-intelligible talkers significantly improved CI users' speech understanding with less-intelligible talkers. The results suggest that spectral normalization using optimal reference patterns for individual CI patients may compensate for some of the acoustic variability across talkers.

PubMed Disclaimer

Figures

**Figure 1**
Implementation framework of the GMM-based spectral normalization algorithm.

**Figure 2**
Normalized talker distortion as a function of number of channels. Solid line: Without spectral normalization. Dashed line: With spectral normalization. Note that the talker distortion between talkers F1 and M1 (unprocessed speech) was used as the reference.

**Figure 3**
Individual and mean sentence recognition performance for talkers M1 and F1. For subjects S1–S3, performance with F1 was better than that with M1; for subjects S4–S9, performance was better with M1 than with F1. The error bars show 1 s.d., and the asterisks show significantly different performance between the two talkers (p<0.05).

**Figure 4**
Wave forms for the sentence “Glue the sheet to the dark blue background.” Top panel: Pitch-shift transformation T0.6 (upward pitch shift). Middle panel: Reference talker T1.0 (unprocessed speech from talker F1). Bottom panel: Pitch-shift transformation T1.6 (downward pitch shift).

**Figure 5**
Spectral envelopes for different processing conditions in Experiment 2. Top panel: Spectral envelopes for reference talker T1.0 and pitch-shift transformations T0.6 and T1.6. Bottom panel: Spectral envelopes for T1.0 and spectral transformations T0.6-to-T1.0 and T1.6-to-T1.0.

**Figure 6**
NH subjects’ overall speech quality ratings for the pitch-shift transformations, with (open symbols) and without (closed symbols) spectral normalization. The error bars show 1 s.d., and the asterisks indicate significantly different ratings with spectral normalization (p<0.05). Note that source talker T1.0 (unprocessed speech from talker F1) was used to anchor the subjective quality ratings.

**Figure 7**
Sentence recognition performance for NH and CI subjects, with (open symbols) and without (closed symbols) spectral transformation, as a function of pitch-shift transformations. The error bars show 1 s.d., and the asterisks indicate significantly different performance after spectral transformation (p<0.05).

See this image and copyright information in PMC

References

1. Allen, J. S., Miller, J. L., and DeSteno, D. (2003). “Individual talker differences in voice-onset-time,” J. Acoust. Soc. Am. JASMAN10.1121/1.1528172 113, 544–552. - DOI - PubMed
1. Assmann, P. F., Nearey, T. M., and Hogan, J. T. (1982). “Vowel identification: Orthographic, perceptual, and acoustic aspects,” J. Acoust. Soc. Am. JASMAN10.1121/1.387579 71, 975–989. - DOI - PubMed
1. Bond, Z. S., and Moore, T. J. (1994). “A note on the acoustic-phonetic characteristics of inadvertently clear speech,” Speech Commun. SCOMDH10.1016/0167-6393(94)90026-4 14, 325–337. - DOI
1. Bradlow, A. R., Torretta, G. M., and Pisoni, D. B. (1996). “Intelligibility of normal speech 1. Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. SCOMDH10.1016/S0167-6393(96)00063-5 20, 255–272. - DOI - PMC - PubMed
1. Cox, R. M., Alexander, G. C., and Gilmore, C. (1987). “Intelligibility of average talkers in typical listening environments,” J. Acoust. Soc. Am. JASMAN10.1121/1.394512 81, 1598–1608. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Affiliation

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical