Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;22(4):443-461.
doi: 10.1007/s10162-021-00790-7. Epub 2021 Apr 20.

An Alternative Explanation for Difficulties with Speech in Background Talkers: Abnormal Fusion of Vowels Across Fundamental Frequency and Ears

Affiliations

An Alternative Explanation for Difficulties with Speech in Background Talkers: Abnormal Fusion of Vowels Across Fundamental Frequency and Ears

Lina A J Reiss et al. J Assoc Res Otolaryngol. 2021 Jul.

Erratum in

Abstract

Normal-hearing (NH) listeners use frequency cues, such as fundamental frequency (voice pitch), to segregate sounds into discrete auditory streams. However, many hearing-impaired (HI) individuals have abnormally broad binaural pitch fusion which leads to fusion and averaging of the original monaural pitches into the same stream instead of segregating the two streams (Oh and Reiss, 2017) and may similarly lead to fusion and averaging of speech streams across ears. In this study, using dichotic speech stimuli, we examined the relationship between speech fusion and vowel identification. Dichotic vowel perception was measured in NH and HI listeners, with across-ear fundamental frequency differences varied. Synthetic vowels /i/, /u/, /a/, and /ae/ were generated with three fundamental frequencies (F0) of 106.9, 151.2, and 201.8 Hz and presented dichotically through headphones. For HI listeners, stimuli were shaped according to NAL-NL2 prescriptive targets. Although the dichotic vowels presented were always different across ears, listeners were not informed that there were no single vowel trials and could identify one vowel or two different vowels on each trial. When there was no F0 difference between the ears, both NH and HI listeners were more likely to fuse the vowels and identify only one vowel. As ΔF0 increased, NH listeners increased the percentage of two-vowel responses, but HI listeners were more likely to continue to fuse vowels even with large ΔF0. Binaural tone fusion range was significantly correlated with vowel fusion rates in both NH and HI listeners. Confusion patterns with dichotic vowels differed from those seen with concurrent monaural vowels, suggesting different mechanisms behind the errors. Together, the findings suggest that broad fusion leads to spectral blending across ears, even for different ΔF0, and may hinder the stream segregation and understanding of speech in the presence of competing talkers.

Keywords: binaural fusion; concurrent vowel; dichotic; hearing loss.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Hearing thresholds for the subjects in this study. The left and right panels show the left and right ears, respectively. Normal-hearing group average audiograms are shown as thick gray lines and circles, with the gray shaded area indicating the minimum and maximum thresholds. Hearing-impaired individual audiograms are shown as thin lines with symbols, and the group average audiograms are shown as black lines. (Color online)
Fig. 2
Fig. 2
Touchscreen response options. Subjects were instructed to select either one or two vowels depending on how many they heard (highlighted in a different color when selected), then select “Done.” Subjects also had the option to repeat the vowel presentation if needed
Fig. 3
Fig. 3
Binaural fusion ranges of the subjects in the study. Fusion ranges are shown for reference tones of 2000 Hz for HI listeners (solid black circles) and NH listeners (solid gray circles), as well as 1000 Hz for HI listeners (open triangles). HI listeners show broader fusion ranges as well as greater variation in fusion ranges compared with NH listeners
Fig. 4
Fig. 4
Relationship of percent single-vowel responses and vowel recognition scores to the fundamental frequency difference between dichotic vowels. a Percent single-vowel responses versus ΔF0 for HI listeners. b Percent single-vowel responses versus ΔF0 for NH listeners. c Percent correct recognition of both vowels versus ΔF0 for HI listeners. d Percent correct recognition of both vowels versus ΔF0 for NH listeners. e Percent correct recognition of at least one vowel versus ΔF0 for HI listeners. f Percent correct recognition of at least one vowel versus ΔF0 for NH listeners. Thin colored lines with markers show individual data, and thick black dashed lines show group average trends. Fusion decreases and performance increases with ΔF0 for both groups, with NH listeners showing more benefit of ΔF0. For legend for HI subjects, see Fig. 1
Fig. 5
Fig. 5
Relationship of vowel recognition score to vowel fusion, indicated by percentage of single vowel responses. Percentage correct recognition of one vowel in the pair given that a single vowel was selected is plotted versus the percentage of single-vowel responses for HI listeners (black squares) and NH listeners (gray squares). The solid line indicates a linear fit of the NH and HI data pooled together; the correlation is significant. Note that the y-axis starts at 50 %, and the chance level for when one vowel is selected
Fig. 6
Fig. 6
Relationship of vowel fusion to tone fusion range. The percentage of single vowel responses is significantly correlated with the fusion range measured using tones for all listeners pooled together and for NH listeners alone (gray circles), but not for HI listeners alone (black circles). Linear fits and R2 values are shown for NH, HI, and pooled (all) data as dotted, dashed, and dash-dot lines, respectively
Fig. 7
Fig. 7
Relationship of percent single-vowel responses and vowel recognition scores to the fundamental frequency difference between monaural concurrent vowels. Only data from the left ear is shown; similar results were observed for left and right ears. a Percent single-vowel responses versus ΔF0 for HI listeners. b Percent single-vowel responses versus ΔF0 for NH listeners. c Percent correct recognition of both vowels versus ΔF0 for HI listeners. d Percent correct recognition of both vowels versus ΔF0 for NH listeners. e Percent correct recognition of at least one vowel versus ΔF0 for HI listeners. f Percent correct recognition of at least one vowel versus ΔF0 for NH listeners. Thin colored lines with markers show individual data, and thick black dashed lines show group average trends. Fusion decreases and performance increases with ΔF0 for both groups
Fig. 8
Fig. 8
Group average dichotic vowel confusion matrices. Left and right panels show NH and HI groups, respectively. Top and bottom panels show confusions for ΔF0 = 0 and ΔF0 = 11 semitones, respectively. Each column indicates a specific vowel pair presented, while rows indicate specific response combinations, including single vowel responses. The numbers and darkness of shading in each cell indicate the number of times a response combination (specified by row location) was selected in response to a vowel pair (indicated by column location). For ΔF0 = 0, single-vowel responses in the first 4 rows predominate and include selections of single vowel responses not in the original vowel pair, such as /ae/ for /a/ + /i/. For ΔF0 = 11, more responses on the diagonal, indicating correct double vowel responses, are seen, especially for NH listeners
Fig. 9
Fig. 9
Example individual dichotic vowel confusion matrices for ΔF0 = 11 semitones, for NH and HI listeners with narrow, medium, and broad binaural tone fusion. The NH listener with narrow fusion (top left panel) has a high proportion of both vowels correct, indicated by responses on the diagonal. NH and HI listeners with medium binaural tone fusion (right panels) have confusion matrices with similar proportions of correct responses for both vowels (on the diagonal) and fused single-vowel responses (top 4 rows). One HI listener with broad binaural tone fusion mainly fused all vowel pairs into single vowels (top 4 rows), even with a ΔF0 of 11 semitones
Fig. 10
Fig. 10
Comparison of dichotic versus monaural concurrent vowel confusion matrices for NH62, a listener with narrow tone fusion, at ΔF0 = 0 semitones. This listener had a high proportion of single-vowel responses (top 4 rows) in the dichotic condition (top) as well as in the monaural left ear and monaural right ear conditions (left and right, respectively). However, note that the specific vowel confusions differed between the dichotic condition and both monaural conditions (cells highlighted in thick red borders)
Fig. 11
Fig. 11
Comparison of dichotic versus monaural concurrent vowel confusion matrices for NH78, a listener with somewhat narrow tone fusion, at ΔF0 = 0 semitones. Plotted as in Fig. 10. Again, note that specific vowel confusions differed between the dichotic and both monaural conditions
Fig. 12
Fig. 12
Comparison of dichotic versus monaural concurrent vowel confusion matrices for HI25, a listener with broad tone fusion, at ΔF0 = 0 semitones. Again, note that specific vowel confusions differed between the dichotic (top) and the left ear monaural condition (bottom; the right ear was not tested in this subject)
Fig. 13
Fig. 13
Simple model of how the perception of /u/ and /ae/ could arise from monaural and dichotic /a/ + /i/, respectively. ad The filtered spectra of the original vowels /a/, /ae/, /i/, and /u/, with circles indicating the formant peaks selected automatically by estimation of the first two local maxima. e Monaural perception of /a/ + /i/ is modeled as the addition of the two original signals in the time domain, followed by the same spectral filtering as the original vowels. This spectrum predicts two formant peaks most similar to the peaks observed for /u/ (D). f Dichotic perception of /a/ + /i/ is modeled as the linear averaging of the spectra of /a/ (A) and /i/ (C), with wider smoothing windows. The resulting spectrum predicts two formant peaks most similar to the peaks observed for /ae/ (b)

Similar articles

Cited by

References

    1. Arehart KH, King CA, Mclean-Mudgett KS. Role of fundamental frequency differences in the perceptual separation of competing vowel sounds by listeners with normal hearing and listeners with hearing loss. J Speech Lang Hear Res. 1997;40:1434–1444. doi: 10.1044/jslhr.4006.1434. - DOI - PubMed
    1. Arehart KH, Katz-Rossi J, Prustman JS. Double-vowel perception in listeners with cochlear hearing loss: differences in fundamental frequency, ear of presentation, and relative amplitude. J Speech Lang Hear Res. 2005;48:236–252. doi: 10.1044/1092-4388(2005/017). - DOI - PubMed
    1. Boersma P, Weenink D. Praat: doing phonetics by computer [Computer program] Version. 2016;6:22.
    1. Brungart D. Informational and energetic masking effects in the perception of two simultaneous talkers. J Acoust Soc Am. 2001;109:1101–1109. doi: 10.1121/1.1345696. - DOI - PubMed
    1. Carney LH, Mcdonough JM. Nonlinear auditory models yield new insights into representations of vowels. Atten Percept Psychophys. 2019;81(4):1034–1046. doi: 10.3758/s13414-018-01644-w. - DOI - PMC - PubMed

Publication types

LinkOut - more resources