Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 May 18;25(20):5004-12.
doi: 10.1523/JNEUROSCI.0799-05.2005.

Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex

Affiliations
Comparative Study

Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex

Asif A Ghazanfar et al. J Neurosci. .

Abstract

In the social world, multiple sensory channels are used concurrently to facilitate communication. Among human and nonhuman primates, faces and voices are the primary means of transmitting social signals (Adolphs, 2003; Ghazanfar and Santos, 2004). Primates recognize the correspondence between species-specific facial and vocal expressions (Massaro, 1998; Ghazanfar and Logothetis, 2003; Izumi and Kojima, 2004), and these visual and auditory channels can be integrated into unified percepts to enhance detection and discrimination. Where and how such communication signals are integrated at the neural level are poorly understood. In particular, it is unclear what role "unimodal" sensory areas, such as the auditory cortex, may play. We recorded local field potential activity, the signal that best correlates with human imaging and event-related potential signals, in both the core and lateral belt regions of the auditory cortex in awake behaving rhesus monkeys while they viewed vocalizing conspecifics. We demonstrate unequivocally that the primate auditory cortex integrates facial and vocal signals through enhancement and suppression of field potentials in both the core and lateral belt regions. The majority of these multisensory responses were specific to face/voice integration, and the lateral belt region shows a greater frequency of multisensory integration than the core region. These multisensory processes in the auditory cortex likely occur via reciprocal interactions with the superior temporal sulcus.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Exemplars of the visual and auditory components of the two types of vocalizations used in this study. Top panels show representative frames at five intervals from the start of the video (the onset of mouth movement) until the end of mouth movement. Middle panels display the time waveform of the auditory component of the vocalization, in which the blue lines indicate the temporally corresponding video frames. Bottom panels show the spectrogram for the same vocalization. A, The coo vocalization. Coos are long-duration, tonal calls produced with protruded lips. B, The grunt vocalization. Grunts are short-duration, noisy calls produced with a subtle mouth opening relative to coos. For both vocalizations, the mouth-movement onset precedes the auditory component.
Figure 2.
Figure 2.
Auditory cortical responses to multimodal vocalizations. Rectified local field potential responses to face plus voice (F+V), voice alone (V), and face alone (F) components of coos and grunts were compared. The solid vertical line indicates the onset of the face signal. Dotted vertical lines indicate the onset and offset of the voice signal. Graphs represent the mean of 10 repetitions with the mean baseline activity subtracted on a trial-by-trial basis. Bar graphs show the mean and SEM of the maximum response (face plus voice or voice alone using a 20 ms window; see Materials and Methods) between the voice onset and offset. This response was then compared statistically with the responses for the other conditions. A multisensory integration (MSI) index was computed using these responses and is indicated at the top right of each bar graph. A, B, One enhanced response and one suppressed response from the auditory core region. C, D, One enhanced response and one suppressed response from the lateral belt region.
Figure 3.
Figure 3.
Multisensory integration across two auditory cortical regions. A, The relative amounts of multisensory integration seen across cortical sites. The percentages represent the fraction of the total number of sites in the auditory core region (n = 46 sites) and the lateral belt region (n = 35 sites). The lateral belt had significantly more sites demonstrating multisensory integration. B, The distribution of peak latencies for the core and lateral belt regions. These data represent the peak amplitude of statistically significant multisensory responses in the LFP signals. Responses were assessed between the onset and offset of the auditory component of the vocalizations; thus latencies are relative to the auditory onset.
Figure 4.
Figure 4.
The average frequency of multisensory integration seen across all sites. For both cortical regions, there were more instances of enhancement than suppression, and grunts more frequently elicited enhancement than did coos. Error bars represent SEM.
Figure 5.
Figure 5.
Relationship between voice-onset time and multisensory integration. A, B, Median (black lines) and interquartile ranges (gray boxes) of enhancement and suppression relative to voice-onset time. The x-axis represents voice-onset time; the y-axis represents the log10-base percentage of the multisensory integration index value. gt, Grunts; co, coos. Note that in A, there was only one enhancement response in the “256 ms/gt” category, whereas in B, there was no response in the “85 ms/co” category and only one response in the “97 ms/gt” category. The magnitude of multisensory effects was not related to voice-onset time. C, D, Proportion of enhanced (n = 93) and suppressed (n = 40) responses across the different voice-onset time categories. Note that enhancement was more frequently observed for short voice-onset times, whereas suppression was more common at longer voice-onset times.
Figure 6.
Figure 6.
Auditory cortical LFP enhancements obey the law of inverse effectiveness, whereby the degree of multisensory enhancement is inversely related to the magnitude of the unimodal response. The y-axes depict the log10-base percentage enhancement calculated from the multisensory integration index (see Materials and Methods). The x-axes depict the corresponding response magnitude of the auditory alone response. Gray dots represent coo responses; black dots represent grunt responses. A, Responses from the auditory core region. B, Responses from the lateral belt region.
Figure 7.
Figure 7.
Face specificity in the multisensory responses of the auditory cortex. A, One frame of a grunt and coo face at maximal mouth opening for one stimulus monkey and the corresponding frames from the disk control videos. B, Examples of rectified LFP responses to face plus voice, voice alone, and disk plus voice conditions corresponding to the stimuli in A. Conventions are as in Figure 2. C, Bar graphs of peak responses corresponding to B. F+V, Face plus voice; V, voice alone; D+V, disk plus voice. Error bars represent SEM.
Figure 8.
Figure 8.
A, Examples of responses integrating both face plus voice and disk plus voice. In both examples, the face plus voice and disk plus voice responses were significantly different from the voice alone condition (p < 0.05). B, Example showing integration of disk plus voice only. The disk plus voice response was significantly different from the other two response conditions (p < 0.05).
Figure 9.
Figure 9.
For a given cortical site, the frequency of face plus voice (F+V) multisensory responses exceeds that of disk plus voice (D+V) responses. The y-axis indicates the frequency of observing multisensory responses for a given cortical site. The maximum number of possible responses is eight (the number of stimuli). Dark gray bars represent the core region of the auditory cortex, whereas light gray bars represent the lateral belt. Error bars represent mean and SE.

Similar articles

Cited by

References

    1. Abry C, Lallouache M-T, Cathiard M-A (1996) How can coarticulation models account for speech sensitivity in audio-visual desynchronization? In: Speechreading by humans and machines: models, systems and applications (Stork D, Henneke M, eds), pp 247-255. Berlin: Springer.
    1. Adolphs R (2003) The cognitive neuroscience of human social behaviour. Nat Rev Neurosci 4: 165-178. - PubMed
    1. Barbour DL, Wang X (2003) Contrast tuning in auditory cortex. Science 299: 1073-1075. - PMC - PubMed
    1. Barraclough NE, Xiao D, Baker CI, Oram MW, Perrett DI (2005) Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci 17: 377-391. - PubMed
    1. Beauchamp MS, Lee KE, Argall BD, Martin A (2004) Integration of auditory and visual information about objects in superior temporal sulcus. Neuron 41: 809-823. - PubMed

Publication types