Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Apr 23;28(17):4457-69.
doi: 10.1523/JNEUROSCI.0541-08.2008.

Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys

Affiliations
Comparative Study

Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys

Asif A Ghazanfar et al. J Neurosci. .

Abstract

The existence of multiple nodes in the cortical network that integrate faces and voices suggests that they may be interacting and influencing each other during communication. To test the hypothesis that multisensory responses in auditory cortex are influenced by visual inputs from the superior temporal sulcus (STS), an association area, we recorded local field potentials and single neurons from both structures concurrently in monkeys. The functional interactions between the auditory cortex and the STS, as measured by spectral analyses, increased in strength during presentations of dynamic faces and voices relative to either communication signal alone. These interactions were not solely modulations of response strength, because the phase relationships were significantly less variable in the multisensory condition as well. A similar analysis of functional interactions within the auditory cortex revealed no similar interactions as a function of stimulus condition, nor did a control condition in which the dynamic face was replaced with a dynamic disk mimicking mouth movements. Single neuron data revealed that these intercortical interactions were reflected in the spiking output of auditory cortex and that such spiking output was coordinated with oscillations in the STS. The vast majority of single neurons that were responsive to voices showed integrative responses when faces, but not control stimuli, were presented in conjunction. Our data suggest that the integration of faces and voices is mediated at least in part by neuronal cooperation between auditory cortex and the STS and that interactions between these structures are a fast and efficient way of dealing with the multisensory communication signals.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Exemplars of the multisensory vocalization and control stimuli. A, Example of a coo call with disk control used in the study. The top panel shows frames at five intervals from the start of the video (the onset of mouth movement) until the end of mouth movement. Beneath the face frames are the disk frames used as control visual stimuli. x-axes depict time in milliseconds. The bottom panels display the time waveform and spectrogram of the vocalization, where the blue lines indicate the temporally corresponding video frames. B, Examples of other face- and disk-voice stimuli used in the study.
Figure 2.
Figure 2.
Cross spectra between local field potentials in auditory cortex and the superior temporal sulcus from a single pair of cortical sites. A, Peristimulus time histograms show the response of the neurons in the STS to two different vocalizations, a grunt and a coo. B, Time–frequency plots (cross-spectrograms) show the average phase-locked cross-spectral power for a single pair of cortical sites. Cross-spectra are averages of 80 trials across all calls and aligned to the onset of the auditory signal. x-axes depict the time in milliseconds as a function of onset of the auditory signal (solid black line). y-axes depict the frequency of the oscillations in hertz. The color bar indicates the amplitude of these signals normalized by the baseline mean. C, The top panel shows the normalized cross-spectra as a function of frequency for the corresponding responses shown in B. x-axes depict frequency in hertz. y-axes depict the average baseline mean normalized cross-spectral power from 0 to 200 ms. Shaded regions denote the SEM computed by a bootstrap method. The bottom panel shows the average normalized cross-spectra across all calls and electrode pairs in the gamma band for the four conditions from 0 to 200 ms after auditory onset. All values are normalized by the baseline mean cross-spectra for different frequency bands.
Figure 3.
Figure 3.
Auditory cortical-STS interactions across the population. A, Population cross-spectrogram for all auditory cortical-STS pairs for the four conditions. Conventions are as in Figure 2 B. B, Difference masks between the cross-spectra for the Face+Voice versus Voice condition and Face+Voice versus Disk+Voice conditions. x-axes depict the time in milliseconds. y-axes depict frequency in hertz. The color bar shows the difference magnitude obtained from the time difference mask. C, Population cross-spectra for different frequencies from 0 to 300 ms after voice onset. x-axes depict frequency in hertz. y-axes depict the average normalized cross-spectral power as a function of time. Shaded regions denote the SEM across all electrode pairs and calls. All values are normalized by the baseline mean for different frequency bands. The right panel shows the average normalized cross-spectra across all calls and electrode pairs in the gamma band (55–95 Hz). D, Population coherence from 0 to 300 ms after voice onset. x-axes depict frequency in hertz. y-axes depict the average normalized coherence. Shaded regions denote the SEM across all electrode pairs and calls. All values are normalized by the baseline mean for different frequency bands. The right panel shows the average normalized coherence across all calls and electrode pairs in the gamma band. E, Population phase concentration from 0 to 300 ms after voice onset. x-axes depict frequency in hertz. y-axes depict the average normalized phase concentration. Shaded regions denote the SEM across all electrode pairs and calls. All values are normalized by the baseline mean for different frequency bands. The right panel shows the phase concentration across all calls and electrode pairs in the gamma band for the four conditions.
Figure 4.
Figure 4.
Interactions within auditory cortex for the population of cortical sites. A, Population cross-spectrogram for all auditory–auditory cortical pairs for the four conditions. Conventions are as in Figure 2 B. B, Difference masks between the cross-spectra for the Face+Voice versus Voice condition and Face+Voice versus Disk+Voice conditions. Conventions are as in Figure 3 B. C, Population cross-spectra from 0 to 300 ms after voice onset. Shaded regions denote the SEM across all electrode pairs and calls. D, Population phase concentration. Conventions are as in Figure 3 E.
Figure 5.
Figure 5.
Single neurons integrate faces and voices at different response latencies. A, Examples of multisensory integration in auditory lateral belt neurons. Peristimulus time histograms and rasters to a grunt vocalization (top left panel), coo vocalization (top right panel), and another grunt (bottom left panel) to Face+Voice (F+V), Voice alone (V), and Face alone (F) conditions. x-axes show time aligned to onset of the face (solid line). Dashed lines indicate the onset and offset of the voice signal. y-axes depict the firing rate of the neuron in spikes per second. Shaded regions denote the SEM. The bottom half of each panel shows the spike raster for the three stimulus conditions. B, Auditory belt neurons show a distribution of peak response latencies to multisensory stimuli. Latencies are relative to the onset of the voice signal. Histogram shows the percentage of responses across all calls and neurons (y-axes) as a function of response latency (x-axes).
Figure 6.
Figure 6.
Visual signals modulate the selectivity of auditory neurons. A, Responses of a single auditory neuron to three different vocalizations. The top panel shows the peristimulus time histogram of the neuron to one of the grunt exemplars in the stimulus set. The Face+Voice response to this grunt is significantly suppressed relative to the Voice alone response. The bottom panels show the response of the same neuron to two other vocalizations, a different grunt and a coo. The auditory response is not significantly different from the multisensory response for these two calls. Figure conventions are the same as in Figure 5 A. B, Visual signals change the selectivity of auditory neurons. The histogram shows the ratio of number of multisensory responses of a neuron to the number of auditory responses from the same neuron. x-axes denotes the ratio of the number of multisensory responses to auditory responses (ranging from 0 to 1). y-axes denote the percentage of neurons. C, Examples of multisensory integration of Face+Voice stimuli compared with Disk+Voice stimuli in auditory neurons. The left panels show enhanced responses when voices are coupled with faces, but no similar modulation when coupled with disks. The right panels show similar effects for suppressed responses. The insets show frames from the Face+Voice stimulus and the temporally corresponding Disk+Voice stimulus. Conventions for the peristimulus time histogram follow Figure 5 A.
Figure 7.
Figure 7.
Relationship between the spiking activity of auditory cortical neurons and the STS local field potential. A, An example spike-field cross-spectrogram between an STS LFP signal and the firing of an auditory neuron for the three stimulus conditions for a single call type. x-axes depict time in milliseconds as a function of the onset of the multisensory response in the auditory neuron (solid black line). y-axes depict the frequency in hertz. The color bar denotes the cross-spectral power normalized by the baseline mean for different frequencies. B, Average cross-spectral power in the local field potential from 40–100 ms before the onset of the multisensory response shown outlined in A. x-axes depict frequency in hertz. y-axes depict the normalized cross-spectral power. C, Population cross-spectrogram for interactions between auditory cortical neurons and the STS LFP signal for three different stimulus conditions plotted as a function of time from the onset of integration. The color bar indicates the power normalized by the baseline mean for different frequency bands. D, Difference masks between the cross-spectra of the Face+Voice condition and the Voice condition and the Face+Voice and Face conditions. Conventions are as in Figure 3 B. E, Population cross-spectra for the three stimulus conditions for the period (outlined by the dotted boxes; 40–100 ms) before the onset of multisensory integration. x-axes depict frequency in hertz. y-axes depict the normalized amplitude of the oscillations in STS. F, Population phase concentration in the gamma band local field potential (by bandpass filtering from 55 to 95 Hz) for the three stimulus conditions. Error bars denote SEM.

Similar articles

Cited by

References

    1. Barbour DL, Wang XQ. Contrast tuning in auditory cortex. Science. 2003;299:1073–1075. - PMC - PubMed
    1. Barraclough NE, Xiao DK, Baker CI, Oram MW, Perrett DI. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci. 2005;17:377–391. - PubMed
    1. Bernstein LE, Auer ET, Takayanagi S. Auditory speech detection in noise enhanced by lipreading. Speech Commun. 2004;44:5–18.
    1. Besle J, Fort A, Delpuech C, Giard MH. Bimodal speech: early suppressive visual effects in human auditory cortex. Eur J Neurosci. 2004;20:2225–2234. - PMC - PubMed
    1. Bizley JK, Nodal FR, Bajo VM, Nelken I, King AJ. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cereb Cortex. 2007;17:2172–2189. - PMC - PubMed

Publication types

LinkOut - more resources