Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 May;123(5):2836-47.
doi: 10.1121/1.2897047.

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Affiliations

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Chuping Liu et al. J Acoust Soc Am. 2008 May.

Abstract

In cochlear implants (CIs), different talkers often produce different levels of speech understanding because of the spectrally distorted speech patterns provided by the implant device. A spectral normalization approach was used to transform the spectral characteristics of one talker to those of another talker. In Experiment 1, speech recognition with two talkers was measured in CI users, with and without spectral normalization. Results showed that the spectral normalization algorithm had small but significant effect on performance. In Experiment 2, the effects of spectral normalization were measured in CI users and normal-hearing (NH) subjects; a pitch-stretching technique was used to simulate six talkers with different fundamental frequencies and vocal tract configurations. NH baseline performance was nearly perfect with these pitch-shift transformations. For CI subjects, while there was considerable intersubject variability in performance with the different pitch-shift transformations, spectral normalization significantly improved the intelligibility of these simulated talkers. The results from Experiments 1 and 2 demonstrate that spectral normalization toward more-intelligible talkers significantly improved CI users' speech understanding with less-intelligible talkers. The results suggest that spectral normalization using optimal reference patterns for individual CI patients may compensate for some of the acoustic variability across talkers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Implementation framework of the GMM-based spectral normalization algorithm.
Figure 2
Figure 2
Normalized talker distortion as a function of number of channels. Solid line: Without spectral normalization. Dashed line: With spectral normalization. Note that the talker distortion between talkers F1 and M1 (unprocessed speech) was used as the reference.
Figure 3
Figure 3
Individual and mean sentence recognition performance for talkers M1 and F1. For subjects S1–S3, performance with F1 was better than that with M1; for subjects S4–S9, performance was better with M1 than with F1. The error bars show 1 s.d., and the asterisks show significantly different performance between the two talkers (p<0.05).
Figure 4
Figure 4
Wave forms for the sentence “Glue the sheet to the dark blue background.” Top panel: Pitch-shift transformation T0.6 (upward pitch shift). Middle panel: Reference talker T1.0 (unprocessed speech from talker F1). Bottom panel: Pitch-shift transformation T1.6 (downward pitch shift).
Figure 5
Figure 5
Spectral envelopes for different processing conditions in Experiment 2. Top panel: Spectral envelopes for reference talker T1.0 and pitch-shift transformations T0.6 and T1.6. Bottom panel: Spectral envelopes for T1.0 and spectral transformations T0.6-to-T1.0 and T1.6-to-T1.0.
Figure 6
Figure 6
NH subjects’ overall speech quality ratings for the pitch-shift transformations, with (open symbols) and without (closed symbols) spectral normalization. The error bars show 1 s.d., and the asterisks indicate significantly different ratings with spectral normalization (p<0.05). Note that source talker T1.0 (unprocessed speech from talker F1) was used to anchor the subjective quality ratings.
Figure 7
Figure 7
Sentence recognition performance for NH and CI subjects, with (open symbols) and without (closed symbols) spectral transformation, as a function of pitch-shift transformations. The error bars show 1 s.d., and the asterisks indicate significantly different performance after spectral transformation (p<0.05).

References

    1. Allen, J. S., Miller, J. L., and DeSteno, D. (2003). “Individual talker differences in voice-onset-time,” J. Acoust. Soc. Am. JASMAN10.1121/1.1528172 113, 544–552. - DOI - PubMed
    1. Assmann, P. F., Nearey, T. M., and Hogan, J. T. (1982). “Vowel identification: Orthographic, perceptual, and acoustic aspects,” J. Acoust. Soc. Am. JASMAN10.1121/1.387579 71, 975–989. - DOI - PubMed
    1. Bond, Z. S., and Moore, T. J. (1994). “A note on the acoustic-phonetic characteristics of inadvertently clear speech,” Speech Commun. SCOMDH10.1016/0167-6393(94)90026-4 14, 325–337. - DOI
    1. Bradlow, A. R., Torretta, G. M., and Pisoni, D. B. (1996). “Intelligibility of normal speech 1. Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. SCOMDH10.1016/S0167-6393(96)00063-5 20, 255–272. - DOI - PMC - PubMed
    1. Cox, R. M., Alexander, G. C., and Gilmore, C. (1987). “Intelligibility of average talkers in typical listening environments,” J. Acoust. Soc. Am. JASMAN10.1121/1.394512 81, 1598–1608. - DOI - PubMed

Publication types