Comparative Study

. 2005 May 18;25(20):5004-12.

doi: 10.1523/JNEUROSCI.0799-05.2005.

Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex

Asif A Ghazanfar¹, Joost X Maier, Kari L Hoffman, Nikos K Logothetis

Affiliations

PMID: 15901781
PMCID: PMC6724848
DOI: 10.1523/JNEUROSCI.0799-05.2005

Comparative Study

Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex

Asif A Ghazanfar et al. J Neurosci. 2005.

. 2005 May 18;25(20):5004-12.

doi: 10.1523/JNEUROSCI.0799-05.2005.

Authors

Asif A Ghazanfar¹, Joost X Maier, Kari L Hoffman, Nikos K Logothetis

Affiliation

¹ Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany. asifg@princeton.edu

PMID: 15901781
PMCID: PMC6724848
DOI: 10.1523/JNEUROSCI.0799-05.2005

Abstract

In the social world, multiple sensory channels are used concurrently to facilitate communication. Among human and nonhuman primates, faces and voices are the primary means of transmitting social signals (Adolphs, 2003; Ghazanfar and Santos, 2004). Primates recognize the correspondence between species-specific facial and vocal expressions (Massaro, 1998; Ghazanfar and Logothetis, 2003; Izumi and Kojima, 2004), and these visual and auditory channels can be integrated into unified percepts to enhance detection and discrimination. Where and how such communication signals are integrated at the neural level are poorly understood. In particular, it is unclear what role "unimodal" sensory areas, such as the auditory cortex, may play. We recorded local field potential activity, the signal that best correlates with human imaging and event-related potential signals, in both the core and lateral belt regions of the auditory cortex in awake behaving rhesus monkeys while they viewed vocalizing conspecifics. We demonstrate unequivocally that the primate auditory cortex integrates facial and vocal signals through enhancement and suppression of field potentials in both the core and lateral belt regions. The majority of these multisensory responses were specific to face/voice integration, and the lateral belt region shows a greater frequency of multisensory integration than the core region. These multisensory processes in the auditory cortex likely occur via reciprocal interactions with the superior temporal sulcus.

PubMed Disclaimer

Figures

**Figure 1.**
Exemplars of the visual and auditory components of the two types of vocalizations used in this study. Top panels show representative frames at five intervals from the start of the video (the onset of mouth movement) until the end of mouth movement. Middle panels display the time waveform of the auditory component of the vocalization, in which the blue lines indicate the temporally corresponding video frames. Bottom panels show the spectrogram for the same vocalization. A, The coo vocalization. Coos are long-duration, tonal calls produced with protruded lips. B, The grunt vocalization. Grunts are short-duration, noisy calls produced with a subtle mouth opening relative to coos. For both vocalizations, the mouth-movement onset precedes the auditory component.

**Figure 2.**
Auditory cortical responses to multimodal vocalizations. Rectified local field potential responses to face plus voice (F+V), voice alone (V), and face alone (F) components of coos and grunts were compared. The solid vertical line indicates the onset of the face signal. Dotted vertical lines indicate the onset and offset of the voice signal. Graphs represent the mean of 10 repetitions with the mean baseline activity subtracted on a trial-by-trial basis. Bar graphs show the mean and SEM of the maximum response (face plus voice or voice alone using a 20 ms window; see Materials and Methods) between the voice onset and offset. This response was then compared statistically with the responses for the other conditions. A multisensory integration (MSI) index was computed using these responses and is indicated at the top right of each bar graph. A, B, One enhanced response and one suppressed response from the auditory core region. C, D, One enhanced response and one suppressed response from the lateral belt region.

**Figure 3.**
Multisensory integration across two auditory cortical regions. A, The relative amounts of multisensory integration seen across cortical sites. The percentages represent the fraction of the total number of sites in the auditory core region (n = 46 sites) and the lateral belt region (n = 35 sites). The lateral belt had significantly more sites demonstrating multisensory integration. B, The distribution of peak latencies for the core and lateral belt regions. These data represent the peak amplitude of statistically significant multisensory responses in the LFP signals. Responses were assessed between the onset and offset of the auditory component of the vocalizations; thus latencies are relative to the auditory onset.

**Figure 4.**
The average frequency of multisensory integration seen across all sites. For both cortical regions, there were more instances of enhancement than suppression, and grunts more frequently elicited enhancement than did coos. Error bars represent SEM.

**Figure 5.**
Relationship between voice-onset time and multisensory integration. A, B, Median (black lines) and interquartile ranges (gray boxes) of enhancement and suppression relative to voice-onset time. The x-axis represents voice-onset time; the y-axis represents the log10-base percentage of the multisensory integration index value. gt, Grunts; co, coos. Note that in A, there was only one enhancement response in the “256 ms/gt” category, whereas in B, there was no response in the “85 ms/co” category and only one response in the “97 ms/gt” category. The magnitude of multisensory effects was not related to voice-onset time. C, D, Proportion of enhanced (n = 93) and suppressed (n = 40) responses across the different voice-onset time categories. Note that enhancement was more frequently observed for short voice-onset times, whereas suppression was more common at longer voice-onset times.

**Figure 6.**
Auditory cortical LFP enhancements obey the law of inverse effectiveness, whereby the degree of multisensory enhancement is inversely related to the magnitude of the unimodal response. The y-axes depict the log10-base percentage enhancement calculated from the multisensory integration index (see Materials and Methods). The x-axes depict the corresponding response magnitude of the auditory alone response. Gray dots represent coo responses; black dots represent grunt responses. A, Responses from the auditory core region. B, Responses from the lateral belt region.

**Figure 7.**
Face specificity in the multisensory responses of the auditory cortex. A, One frame of a grunt and coo face at maximal mouth opening for one stimulus monkey and the corresponding frames from the disk control videos. B, Examples of rectified LFP responses to face plus voice, voice alone, and disk plus voice conditions corresponding to the stimuli in A. Conventions are as in Figure 2. C, Bar graphs of peak responses corresponding to B. F+V, Face plus voice; V, voice alone; D+V, disk plus voice. Error bars represent SEM.

**Figure 8.**
A, Examples of responses integrating both face plus voice and disk plus voice. In both examples, the face plus voice and disk plus voice responses were significantly different from the voice alone condition (p < 0.05). B, Example showing integration of disk plus voice only. The disk plus voice response was significantly different from the other two response conditions (p < 0.05).

**Figure 9.**
For a given cortical site, the frequency of face plus voice (F+V) multisensory responses exceeds that of disk plus voice (D+V) responses. The y-axis indicates the frequency of observing multisensory responses for a given cortical site. The maximum number of possible responses is eight (the number of stimuli). Dark gray bars represent the core region of the auditory cortex, whereas light gray bars represent the lateral belt. Error bars represent mean and SE.

See this image and copyright information in PMC

Cited by

Visual cortical areas of the mouse: comparison of parcellation and network structure with primates.
Laramée ME, Boire D. Laramée ME, et al. Front Neural Circuits. 2015 Jan 7;8:149. doi: 10.3389/fncir.2014.00149. eCollection 2014. Front Neural Circuits. 2015. PMID: 25620914 Free PMC article. Review.
Rapid brain discrimination of sounds of objects.
Murray MM, Camen C, Gonzalez Andino SL, Bovet P, Clarke S. Murray MM, et al. J Neurosci. 2006 Jan 25;26(4):1293-302. doi: 10.1523/JNEUROSCI.4511-05.2006. J Neurosci. 2006. PMID: 16436617 Free PMC article.
A temporal hierarchy for conspecific vocalization discrimination in humans.
De Lucia M, Clarke S, Murray MM. De Lucia M, et al. J Neurosci. 2010 Aug 18;30(33):11210-21. doi: 10.1523/JNEUROSCI.2239-10.2010. J Neurosci. 2010. PMID: 20720129 Free PMC article.
Faces and Voices Processing in Human and Primate Brains: Rhythmic and Multimodal Mechanisms Underlying the Evolution and Development of Speech.
Michon M, Zamorano-Abramson J, Aboitiz F. Michon M, et al. Front Psychol. 2022 Mar 30;13:829083. doi: 10.3389/fpsyg.2022.829083. eCollection 2022. Front Psychol. 2022. PMID: 35432052 Free PMC article. Review.
The neurobiology of vocal communication in marmosets.
Grijseels DM, Prendergast BJ, Gorman JC, Miller CT. Grijseels DM, et al. Ann N Y Acad Sci. 2023 Oct;1528(1):13-28. doi: 10.1111/nyas.15057. Epub 2023 Aug 24. Ann N Y Acad Sci. 2023. PMID: 37615212 Free PMC article. Review.

See all "Cited by" articles

References

1. Abry C, Lallouache M-T, Cathiard M-A (1996) How can coarticulation models account for speech sensitivity in audio-visual desynchronization? In: Speechreading by humans and machines: models, systems and applications (Stork D, Henneke M, eds), pp 247-255. Berlin: Springer.
1. Adolphs R (2003) The cognitive neuroscience of human social behaviour. Nat Rev Neurosci 4: 165-178. - PubMed
1. Barbour DL, Wang X (2003) Contrast tuning in auditory cortex. Science 299: 1073-1075. - PMC - PubMed
1. Barraclough NE, Xiao D, Baker CI, Oram MW, Perrett DI (2005) Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci 17: 377-391. - PubMed
1. Beauchamp MS, Lee KE, Argall BD, Martin A (2004) Integration of auditory and visual information about objects in superior temporal sulcus. Neuron 41: 809-823. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex

Affiliation

Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources