Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 13;7(1):250-258.
doi: 10.1002/lio2.732. eCollection 2022 Feb.

Voice emotion recognition by Mandarin-speaking pediatric cochlear implant users in Taiwan

Affiliations

Voice emotion recognition by Mandarin-speaking pediatric cochlear implant users in Taiwan

Yung-Song Lin et al. Laryngoscope Investig Otolaryngol. .

Abstract

Objectives: To explore the effects of obligatory lexical tone learning on speech emotion recognition and the cross-culture differences between United States and Taiwan for speech emotion understanding in children with cochlear implant.

Methods: This cohort study enrolled 60 cochlear-implanted (cCI) Mandarin-speaking, school-aged children who underwent cochlear implantation before 5 years of age and 53 normal-hearing children (cNH) in Taiwan. The emotion recognition and the sensitivity of fundamental frequency (F0) changes for those school-aged cNH and cCI (6-17 years old) were examined in a tertiary referred center.

Results: The mean emotion recognition score of the cNH group was significantly better than the cCI. Female speakers' vocal emotions are more easily to be recognized than male speakers' emotion. There was a significant effect of age at test on voice recognition performance. The average score of cCI with full-spectrum speech was close to the average score of cNH with eight-channel narrowband vocoder speech. The average performance of voice emotion recognition across speakers for cCI could be predicted by their sensitivity to changes in F0.

Conclusions: Better pitch discrimination ability comes with better voice emotion recognition for Mandarin-speaking cCI. Besides the F0 cues, cCI are likely to adapt their voice emotion recognition by relying more on secondary cues such as intensity and duration. Although cross-culture differences exist for the acoustic features of voice emotion, Mandarin-speaking cCI and their English-speaking cCI peer expressed a positive effect for age at test on emotion recognition, suggesting the learning effect and brain plasticity. Therefore, further device/processor development to improve presentation of pitch information and more rehabilitative efforts are needed to improve the transmission and perception of voice emotion in Mandarin.

Level of evidence: 3.

Keywords: cochlear implant; lexical tone; pitch discrimination; voice emotion.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
Results of acoustic analyses of male (red circles) and female (blue squares) speakers' utterances in five emotions (abscissa). For the top five panels, each panel corresponds to a different acoustic cue. Each point labeled in the y‐axis represents the mean of all 12 sentences for each speaker, and error bars represent standard deviations. The bottom left panel is for the acoustic discriminability of Mandarin sentences, whereas the bottom right panel is for the acoustic discriminability of English sentences used in our previous study by Monita et al. SPL, Sound pressure level
FIGURE 2
FIGURE 2
Error pattern of voice emotion recognition. It shows the error patterns for the cCI and cNH groups of listeners, for the male and female speakers' sentences, and under each condition of spectral resolution tested. The cells are color‐coded to represent the strength of the numerical values, but the actual values are also indicated. cCI, Cochlear‐implanted children; cNH, normal‐hearing children
FIGURE 3
FIGURE 3
Mean voice emotion recognition score for cCI and cNH under different spectral degradation, speaker, and age. In the left panel, an LME analysis with RAU‐transformed scores as the independent variable; age, condition (spectral resolution), and speaker as fixed effects; and subject‐based random intercepts showed significant effects of age, F(1, 51) = 21.42, p < .0001; condition, F(3, 51) = 2758.43, p < .0001; speaker, F(1, 51) = 6.26, p = .0156; and a significant interaction between speaker and condition, F(3, 51) = 8.49, p = .0001. In the central panel, the average score of cCI with full‐spectrum speech was closed to the average score of cNH with eight‐channel NBV speech. In the right panel, voice emotion recognition score as a function of F0 threshold (semitone) revealed that the average performance across talkers for cCI could be predicted by their sensitivity to changes in F0 (the thresholds extracted from the Weibull fits at a d′ of 0.77; R 2 = .3302; p = .0064). cCI, Cochlear‐implanted children; cNH, normal‐hearing children; LME, linear mixed effects; NBV, narrowband vocoder; RAU, rationalized arcsine unit

Similar articles

Cited by

References

    1. Bryant G, Barrett HC. Vocal emotion recognition across disparate cultures. J Cogn Cult. 2008;8(1–2):135‐148. doi:10.1163/156770908X289242 - DOI
    1. Bryant GA. The evolution of human vocal emotion. Emot Rev. 2021;13(1):25‐33. doi:10.1177/1754073920930791 - DOI
    1. Cosme G, Tavares V, Nobre G, et al. Cultural differences in vocal emotion recognition: a behavioural and skin conductance study in Portugal and Guinea‐Bissau. Psychol Res. 2021. doi:10.1007/s00426-021-01498-2 [published Online First]. - DOI - PMC - PubMed
    1. Planalp S. Varieties of cues to emotion in naturally occurring situations. Cogn Emot. 1996;10(2):137‐154. doi:10.1080/026999396380303 - DOI
    1. Xin L, Fu QJ, Galvin JJ 3rd. Vocal emotion recognition by normal‐hearing listeners and cochlear implant users. Trends Amplif. 2007;11(4):301‐315. doi:10.1177/1084713807305301 - DOI - PMC - PubMed

LinkOut - more resources