Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 18;33(3):277-294.
doi: 10.1163/22134808-20191423. Print 2020 Feb 28.

Visual Enhancement of Relevant Speech in a 'Cocktail Party'

Affiliations

Visual Enhancement of Relevant Speech in a 'Cocktail Party'

Niti Jaha et al. Multisens Res. .

Abstract

Lip-reading improves intelligibility in noisy acoustical environments. We hypothesized that watching mouth movements benefits speech comprehension in a 'cocktail party' by strengthening the encoding of the neural representations of the visually paired speech stream. In an audiovisual (AV) task, EEG was recorded as participants watched and listened to videos of a speaker uttering a sentence while also hearing a concurrent sentence by a speaker of the opposite gender. A key manipulation was that each audio sentence had a 200-ms segment replaced by white noise. To assess comprehension, subjects were tasked with transcribing the AV-attended sentence on randomly selected trials. In the auditory-only trials, subjects listened to the same sentences and completed the same task while watching a static picture of a speaker of either gender. Subjects directed their listening to the voice of the gender of the speaker in the video. We found that the N1 auditory-evoked potential (AEP) time-locked to white noise onsets was significantly more inhibited for the AV-attended sentences than for those of the auditorily-attended (A-attended) and AV-unattended sentences. N1 inhibition to noise onsets has been shown to index restoration of phonemic representations of degraded speech. These results underscore that attention and congruency in the AV setting help streamline the complex auditory scene, partly by reinforcing the neural representations of the visually attended stream, heightening the perception of continuity and comprehension.

Keywords: Audiovisual integration; auditory-evoked potentials; phonemic restoration; ‘cocktail party’.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare no conflicts of interest, financial or otherwise.

Figures

Figure 1.
Figure 1.
Audiovisual task experimental design. Participants watched and listened to a human speaker uttering a sentence while also hearing a concurrent sentence of a speaker (no video) of the opposite gender. A 200 ms segment of each acoustic sentence was replaced by white noise beginning at 25% following sentence sound onset of one sentence and at 75% following sentence sound onset of the other sentence. Audiovisual pairing and noise placements were counterbalanced across trials to rule out stimulus differences. Individuals transcribed what they heard during randomly chosen trials throughout the experiment. A similar task without visual mouth movements (auditory-only task) served as a control condition.
Figure 2.
Figure 2.
Individual transcription accuracy for AV-attended and A-attended speech sentences.
Figure 3.
Figure 3.
(A) Auditory evoked potential (AEP) waveforms at channel Cz, time-locked to noise onsets of the AV-attended and AV-unattended sentences (left panel) and A-attended and A-unattended sentences (right panel). Gray rectangular area represents the window (133–185 ms) of significance distinguishing the AEP waveforms of AV-attended and AV-unattended sentences. (B) Left panel: Topographies of the mean AEP activity within the window of significance (13–197 ms) for AV-unattended, AV-attended conditions and the mean t-value topography of the significant window distinguishing the two AEP waveforms. Right panel: similar to left panel, but for A-unattended and A-attended conditions. (C) Box plots of mean amplitude activity occurring within the window of significance (133–185 ms) at channel Cz for all conditions.
Figure 4.
Figure 4.
(A) Auditory evoked potential (AEP) waveforms at channel FCz, time-locked to noise onsets of the AV-attended and A-attended percepts (left panel) and AV-unattended and A-unattended conditions (right panel). Gray rectangular area represents the window (133–174 ms) of significance distinguishing the AEP waveforms of AV-attended and A-attended conditions. (B) Left panel: topographies of the average activity within the window of significance for A-attended, AV-attended and the mean t-value topography of the significant window (131–174 ms) distinguishing the two AEP waveforms. Right panel: similar to left panel but for A-unattended and AV-unattended conditions. (C) Box plots of mean amplitude activity occurring within the window of significance (133–174 ms) for all conditions.

References

    1. Abbott NT and Shahin AJ (2018). Cross-modal phonetic encoding facilitates the McGurk illusion and phonemic restoration, J. Neurophysiol 120, 2988–3000. - PMC - PubMed
    1. Alho K, Töttölä K, Reinikainen K, Sams M and Näätänen R (1987). Brain mechanism of selective listening reflected by event-related potentials, Electroencephalogr. Clin. Neurophysiol 68, 458–470. - PubMed
    1. Baart M (2016). Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays, Psychophysiology 53, 1295–1306. - PubMed
    1. Baart M and Samuel AG (2015). Turning a blind eye to the lexicon: ERPs show no crosstalk between lip-read and lexical context during speech sound processing, J. Mem. Lang 85, 42–59.
    1. Baart M, Stekelenburg JJ and Vroomen J (2014). Electrophysiological evidence for speech-specific audiovisual integration, Neuropsychologia 53, 115–121. - PubMed

Publication types