Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 25;12(1):3206.
doi: 10.1038/s41598-022-06855-8.

Effects of training and using an audio-tactile sensory substitution device on speech-in-noise understanding

Affiliations

Effects of training and using an audio-tactile sensory substitution device on speech-in-noise understanding

K Cieśla et al. Sci Rep. .

Abstract

Understanding speech in background noise is challenging. Wearing face-masks, imposed by the COVID19-pandemics, makes it even harder. We developed a multi-sensory setup, including a sensory substitution device (SSD) that can deliver speech simultaneously through audition and as vibrations on the fingertips. The vibrations correspond to low frequencies extracted from the speech input. We trained two groups of non-native English speakers in understanding distorted speech in noise. After a short session (30-45 min) of repeating sentences, with or without concurrent matching vibrations, we showed comparable mean group improvement of 14-16 dB in Speech Reception Threshold (SRT) in two test conditions, i.e., when the participants were asked to repeat sentences only from hearing and also when matching vibrations on fingertips were present. This is a very strong effect, if one considers that a 10 dB difference corresponds to doubling of the perceived loudness. The number of sentence repetitions needed for both types of training to complete the task was comparable. Meanwhile, the mean group SNR for the audio-tactile training (14.7 ± 8.7) was significantly lower (harder) than for the auditory training (23.9 ± 11.8), which indicates a potential facilitating effect of the added vibrations. In addition, both before and after training most of the participants (70-80%) showed better performance (by mean 4-6 dB) in speech-in-noise understanding when the audio sentences were accompanied with matching vibrations. This is the same magnitude of multisensory benefit that we reported, with no training at all, in our previous study using the same experimental procedures. After training, performance in this test condition was also best in both groups (SRT ~ 2 dB). The least significant effect of both training types was found in the third test condition, i.e. when participants were repeating sentences accompanied with non-matching tactile vibrations and the performance in this condition was also poorest after training. The results indicate that both types of training may remove some level of difficulty in sound perception, which might enable a more proper use of speech inputs delivered via vibrotactile stimulation. We discuss the implications of these novel findings with respect to basic science. In particular, we show that even in adulthood, i.e. long after the classical "critical periods" of development have passed, a new pairing between a certain computation (here, speech processing) and an atypical sensory modality (here, touch) can be established and trained, and that this process can be rapid and intuitive. We further present possible applications of our training program and the SSD for auditory rehabilitation in patients with hearing (and sight) deficits, as well as healthy individuals in suboptimal acoustic situations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
(A) The vibrating interface of the SSD and (B) the MatLab GUI.
Figure 2
Figure 2
The timeline of the experiment. AT Audio-Tactile, SRT an individually established Speech Reception Threshold that is then used throughout the Training session.
Figure 3
Figure 3
Speech reception thresholds in three test conditions before and after training in two groups separately; bars correspond to standard errors of the mean (Bonferroni, *indicates p < 0.017, **indicates p < 0.003, ***indicates p < 0.0003).
Figure 4
Figure 4
Speech reception thresholds in two sessions separately; bars correspond to standard errors of the mean; p values were Bonferroni corrected, *indicates p < 0.017, **indicates p < 0.003, ***indicates p < 0.0003.
Figure 5
Figure 5
Scatterplots showing a positive relationship between SRT values obtained in three tests before training and the amount of improvement in each of them; shadowing corresponds to a 95% confidence interval.

References

    1. Stevenson R, et al. Multisensory integration in cochlear implant recipients. Ear Hear. 2017;38:521–538. - PMC - PubMed
    1. Bayard C, et al. Cued speech enhances speech-in-noise perception. J Deaf. Stud. Deaf. 2019;24:223–233. - PubMed
    1. Jaha N, Shen S, Kerlin JR, Shahin AJ. Visual enhancement of relevant speech in a 'cocktail party'. Multisens. Res. 2020;33:277–294. - PMC - PubMed
    1. Peele JE, Sommers MS. Prediction and constraint in audiovisual speech perception. Cortex. 2015;68:169–181. - PMC - PubMed
    1. van de Rijt LPH, Roye A, Mylanus EAM, van Opstal AJ, van Wanrooij MM. The principle of inverse effectiveness in audiovisual speech perception. Front. Hum. Neurosci. 2019;13:335. - PMC - PubMed

Publication types