. 2020 Feb 18;33(3):277-294.

doi: 10.1163/22134808-20191423. Print 2020 Feb 28.

Visual Enhancement of Relevant Speech in a 'Cocktail Party'

Niti Jaha¹, Stanley Shen¹, Jess R Kerlin¹, Antoine J Shahin^{1

2}

Affiliations

¹ 1Center for Mind and Brain, University of California, Davis, 95618, USA.
² 2Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA.

PMID: 32508080
PMCID: PMC7308176
DOI: 10.1163/22134808-20191423

Visual Enhancement of Relevant Speech in a 'Cocktail Party'

Niti Jaha et al. Multisens Res. 2020.

. 2020 Feb 18;33(3):277-294.

doi: 10.1163/22134808-20191423. Print 2020 Feb 28.

Authors

Niti Jaha¹, Stanley Shen¹, Jess R Kerlin¹, Antoine J Shahin^{1

2}

Affiliations

¹ 1Center for Mind and Brain, University of California, Davis, 95618, USA.
² 2Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA.

PMID: 32508080
PMCID: PMC7308176
DOI: 10.1163/22134808-20191423

Abstract

Lip-reading improves intelligibility in noisy acoustical environments. We hypothesized that watching mouth movements benefits speech comprehension in a 'cocktail party' by strengthening the encoding of the neural representations of the visually paired speech stream. In an audiovisual (AV) task, EEG was recorded as participants watched and listened to videos of a speaker uttering a sentence while also hearing a concurrent sentence by a speaker of the opposite gender. A key manipulation was that each audio sentence had a 200-ms segment replaced by white noise. To assess comprehension, subjects were tasked with transcribing the AV-attended sentence on randomly selected trials. In the auditory-only trials, subjects listened to the same sentences and completed the same task while watching a static picture of a speaker of either gender. Subjects directed their listening to the voice of the gender of the speaker in the video. We found that the N1 auditory-evoked potential (AEP) time-locked to white noise onsets was significantly more inhibited for the AV-attended sentences than for those of the auditorily-attended (A-attended) and AV-unattended sentences. N1 inhibition to noise onsets has been shown to index restoration of phonemic representations of degraded speech. These results underscore that attention and congruency in the AV setting help streamline the complex auditory scene, partly by reinforcing the neural representations of the visually attended stream, heightening the perception of continuity and comprehension.

Keywords: Audiovisual integration; auditory-evoked potentials; phonemic restoration; ‘cocktail party’.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare no conflicts of interest, financial or otherwise.

Figures

**Figure 1.**
Audiovisual task experimental design. Participants watched and listened to a human speaker uttering a sentence while also hearing a concurrent sentence of a speaker (no video) of the opposite gender. A 200 ms segment of each acoustic sentence was replaced by white noise beginning at 25% following sentence sound onset of one sentence and at 75% following sentence sound onset of the other sentence. Audiovisual pairing and noise placements were counterbalanced across trials to rule out stimulus differences. Individuals transcribed what they heard during randomly chosen trials throughout the experiment. A similar task without visual mouth movements (auditory-only task) served as a control condition.

**Figure 2.**
Individual transcription accuracy for *AV-attended* and *A-attended* speech sentences.

**Figure 3.**
(A) Auditory evoked potential (AEP) waveforms at channel Cz, time-locked to noise onsets of the *AV-attended* and *AV-unattended* sentences (left panel) and *A-attended* and *A-unattended* sentences (right panel). Gray rectangular area represents the window (133–185 ms) of significance distinguishing the AEP waveforms of *AV-attended* and *AV-unattended* sentences. (B) Left panel: Topographies of the mean AEP activity within the window of significance (13–197 ms) for *AV-unattended*, *AV-attended* conditions and the mean t-value topography of the significant window distinguishing the two AEP waveforms. Right panel: similar to left panel, but for *A-unattended* and *A-attended* conditions. (C) Box plots of mean amplitude activity occurring within the window of significance (133–185 ms) at channel Cz for all conditions.

**Figure 4.**
(A) Auditory evoked potential (AEP) waveforms at channel FCz, time-locked to noise onsets of the *AV-attended* and *A-attended* percepts (left panel) and *AV-unattended* and *A-unattended* conditions (right panel). Gray rectangular area represents the window (133–174 ms) of significance distinguishing the AEP waveforms of *AV-attended* and *A-attended* conditions. (B) Left panel: topographies of the average activity within the window of significance for *A-attended*, *AV-attended* and the mean t-value topography of the significant window (131–174 ms) distinguishing the two AEP waveforms. Right panel: similar to left panel but for *A-unattended* and *AV-unattended* conditions. (C) Box plots of mean amplitude activity occurring within the window of significance (133–174 ms) for all conditions.

See this image and copyright information in PMC

References

1. Abbott NT and Shahin AJ (2018). Cross-modal phonetic encoding facilitates the McGurk illusion and phonemic restoration, J. Neurophysiol 120, 2988–3000. - PMC - PubMed
1. Alho K, Töttölä K, Reinikainen K, Sams M and Näätänen R (1987). Brain mechanism of selective listening reflected by event-related potentials, Electroencephalogr. Clin. Neurophysiol 68, 458–470. - PubMed
1. Baart M (2016). Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays, Psychophysiology 53, 1295–1306. - PubMed
1. Baart M and Samuel AG (2015). Turning a blind eye to the lexicon: ERPs show no crosstalk between lip-read and lexical context during speech sound processing, J. Mem. Lang 85, 42–59.
1. Baart M, Stekelenburg JJ and Vroomen J (2014). Electrophysiological evidence for speech-specific audiovisual integration, Neuropsychologia 53, 115–121. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Associated data

figshare/10.6084/m9.figshare.c.4510976.v1

Grants and funding

R01 DC013543/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Visual Enhancement of Relevant Speech in a 'Cocktail Party'

Affiliations

Visual Enhancement of Relevant Speech in a 'Cocktail Party'

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Medical