Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 16;14(7):e0219744.
doi: 10.1371/journal.pone.0219744. eCollection 2019.

Speech-specific audiovisual integration modulates induced theta-band oscillations

Affiliations

Speech-specific audiovisual integration modulates induced theta-band oscillations

Alma Lindborg et al. PLoS One. .

Abstract

Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4-8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Time evolution of the negative cluster (p = 0.0200) in the 4–8 Hz band for the congruent AV condition.
Sensors belonging to the cluster in bold.
Fig 2
Fig 2. Grand average power by time (x-axis) and frequency (y-axis) for speech mode (upper row) and non-speech mode (lower row) groups in the N1/P2 dataset, at sensor level.
In the non-speech mode group, enhanced theta-band activity is observed from around 100 ms to 400 ms. This effect is largely absent in the speech mode group for the audiovisual conditions, with the biggest between-groups difference for Audiovisual Congruent trials.
Fig 3
Fig 3. Topographic distribution of grand average 4–8 Hz power for speech mode (top) and non-speech mode (bottom) at 0–300 ms.
Fig 4
Fig 4. Mean power over the SM < NSM cluster found for the pooled audiovisual conditions.
Whiskers represent the standard error of the mean over participants.
Fig 5
Fig 5. Grand average power at a central sensor for the MMN dataset, by group and condition.
For the Speech mode group, there are no clear differences between the conditions, contrary to the mismatch hypothesis. For the Non-speech mode group, there seems to be a deviant < standard difference in the alpha and upper theta band, which cannot be explained by any of our hypotheses.
Fig 6
Fig 6. Topographic distribution of grand average 4–8 Hz power for speech mode (left) and non-speech mode (right) at 200–500 ms.

References

    1. Sumby WH, Pollack I. Visual Contribution to Speech Intelligibility in Noise. J Acoust Soc Am. 1954;26: 212–215. 10.1121/1.1907309 - DOI
    1. van Wassenhove V, Grant KW, Poeppel D. Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci. 2005;102: 1181–1186. 10.1073/pnas.0408949102 - DOI - PMC - PubMed
    1. McGurk H, MacDonald J. Hearing lips and seeing voices. Nature. 1976;264: 746 10.1038/264746a0 - DOI - PubMed
    1. Alsius A, Navarra J, Campbell R, Soto-Faraco S. Audiovisual Integration of Speech Falters under High Attention Demands. Curr Biol. 2005;15: 839–843. 10.1016/j.cub.2005.03.046 - DOI - PubMed
    1. Grant KW, Seitz PF. Measures of auditory-visual integration in nonsense syllables and sentences. J Acoust Soc Am. 1998;104: 2438–2450. 10.1121/1.423751 - DOI - PubMed

Publication types