Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 Mar;12(3):106-13.
doi: 10.1016/j.tics.2008.01.002. Epub 2008 Feb 15.

Neuronal oscillations and visual amplification of speech

Affiliations
Review

Neuronal oscillations and visual amplification of speech

Charles E Schroeder et al. Trends Cogn Sci. 2008 Mar.

Abstract

It is widely recognized that viewing a speaker's face enhances vocal communication, although the neural substrates of this phenomenon remain unknown. We propose that the enhancement effect uses the ongoing oscillatory activity of local neuronal ensembles in the primary auditory cortex. Neuronal oscillations reflect rhythmic shifting of neuronal ensembles between high and low excitability states. Our hypothesis holds that oscillations are 'predictively' modulated by visual input, so that related auditory input arrives during a high excitability phase and is thus amplified. We discuss the anatomical substrates and key timing parameters that enable and constrain this effect. Our hypothesis makes testable predictions for future studies and emphasizes the idea that 'background' oscillatory activity is instrumental to cortical sensory processing.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The relationship of oscillation phase to neuronal excitability
The three waveforms depict multiunit activity (MUA) amplitude as a function of the phase of spontaneous delta (1–4 Hz, black), theta (5–7 Hz, green) and gamma (25–50 Hz, red) oscillations, measured at a supragranular layer recording site in an individual experiment that sampled from the primary auditory cortex (A1) in an awake macaque monkey. MUA reflects the net action potential activity from neurons surrounding the recording site. In the absence of sensory input, MUA variations thus reflect net increases and decreases in the excitability of the local neuronal ensemble. MUA amplitude variations over three oscillation cycles are shown in each case. ‘Firing phase’ is the phase of the spontaneous oscillation currents during which neurons are most excitable, and therefore, most likely to generate action potentials (this corresponds to the largest MUA signals). Overlaid box and whisker plots show firing phase data pooled across all experiments (lines depict lower quartile, median and upper quartile values; whiskers depict the range of the observations). There is a clear phase-related modulation of the MUA amplitude in all the layers (the difference in MUA between the phase with maximal MUA (‘firing phase’) and the opposite phase was significant for all frequencies, in all cortical layers (Wilcoxon signed rank test, p < 0.01). Adapted, with permission, from Ref. [16].
Figure 2
Figure 2. Functional consequences of oscillation phase
(a) The relationship between excitability, as indexed by the action potential firing rate (red), and the phase of oscillation in the local neuronal ensemble, as indexed by a local field potential (blue). From this type of experimental observation [11,16] we have proposed that ongoing neuronal oscillations have optimal (high excitability) and non-optimal (low excitability) phases. (b) A series of simulated single-trial responses representing activity in A1 as affected by visual inputs. When the system is at rest and unengaged (baseline pre-stimulus period to the left of zero) oscillations within a given frequency have a high degree of phase-variability across trials (gray dashed line). Presentation of a visual stimulus at time zero (arrow) can cause a phase reset of the ongoing oscillations, such that the oscillation develops strong phase coherence between trials; under these conditions, the optimal phases (red lines) and non-optimal phases (blue lines) align separately. (c) Sensory inputs arriving during the baseline (gray) generate highly variable response amplitudes. Inputs arriving during the optimal phase (red) are amplified, whereas those arriving during the non-optimal phase (blue) are suppressed. Over time, the crosstrial coherence dissipates, and the system goes back to its resting (random phase) state. (d) The top (green) trace illustrates the typical observation: oscillations recorded in the brain are normally complex mixtures of components at different frequencies. The traces below illustrate the individual oscillatory components in the delta (1.5 Hz), theta (7 Hz) and gamma (40 Hz) band that comprise the composite waveform. We and others have noted (see text) that in normal systems, there is strong phase–amplitude coupling between frequencies, and it has a hierarchical organization. Gamma oscillatory amplitude varies with the phase of the underlying theta oscillation, and theta oscillatory amplitude varies with the phase of the underlying delta oscillation. As explained in the text, we propose that this ‘nesting’ of higher-in-lower frequencies might optimize the processing of conspecific vocalizations, which have similar temporal structure.
Figure 3
Figure 3. In order for visual inputs to modulate primary auditory processing, the ideal arrangement would be for visual inputs to arrive there before the time of auditory response onset
One factor that allows this to occur is the typical delay between visual articulatory gestures and the accompanying vocalizations. Examples of this visual–auditory offset are illustrated here, using a monkey making a ‘coo’ call (top), a human imitating this monkey coo (middle), and a human making a similar human vocalization (‘hello’, bottom). In each case, the auditory amplitude envelope of the call (sampled at 44.1 kHz) is displayed above a series of simultaneous video frames; these were acquired at 30 Hz (33.3 ms per frame), but only a key subset of the frames are shown, linked by arrows to the appropriate point in the auditory time line. The lag between the first detectable opening of the mouth and the onset of the auditory envelope function is displayed for each case (arrows and corresponding values in ms).
Figure I
Figure I
A schematic summary of visual projections that terminate in and near A1, including feedback (red solid lines) from superior temporal polysensory (STP) area, prefrontal (Pf) cortex and Intraparietal (IP) areas, lateral projections (green dashed lines) from primary and secondary visual cortices (V1/V2) and feedforward (purple dashed line) inputs from nonspecific and higher order thalamic regions (yellow shading) such as suprageniculate, posterior, anterior dorsal and magnocellular divisions of the medial geniculate complex, as well as portions of the pulvinar complex.

References

    1. Sumby WH, Polack I. Perceptual amplification of speech sounds by visual cues. J. Acoust. Soc. Am. 1954;26:212–215.
    1. Calvert GA, et al. Activation of auditory cortex during silent lipreading. Science. 1997;276:593–596. - PubMed
    1. Bruce C, et al. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J. Neurophysiol. 1981;46:369–384. - PubMed
    1. Besle J, et al. Bimodal speech: early suppressive visual effects in human auditory cortex. Eur. J. Neurosci. 2004;20:2225–2234. - PMC - PubMed
    1. Pekkola J, et al. Primary auditory cortex driven by visual speech: an fMRI study at 3T. Neuroreport. 2005;16:125–128. - PubMed

Publication types