Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 4;13(1):5507.
doi: 10.1038/s41598-023-32133-2.

Pupil dilation reflects the dynamic integration of audiovisual emotional speech

Affiliations

Pupil dilation reflects the dynamic integration of audiovisual emotional speech

Pablo Arias Sarah et al. Sci Rep. .

Erratum in

Abstract

Emotional speech perception is a multisensory process. When speaking with an individual we concurrently integrate the information from their voice and face to decode e.g., their feelings, moods, and emotions. However, the physiological reactions-such as the reflexive dilation of the pupil-associated to these processes remain mostly unknown. That is the aim of the current article, to investigate whether pupillary reactions can index the processes underlying the audiovisual integration of emotional signals. To investigate this question, we used an algorithm able to increase or decrease the smiles seen in a person's face or heard in their voice, while preserving the temporal synchrony between visual and auditory channels. Using this algorithm, we created congruent and incongruent audiovisual smiles, and investigated participants' gaze and pupillary reactions to manipulated stimuli. We found that pupil reactions can reflect emotional information mismatch in audiovisual speech. In our data, when participants were explicitly asked to extract emotional information from stimuli, the first fixation within emotionally mismatching areas (i.e., the mouth) triggered pupil dilation. These results reveal that pupil dilation can reflect the dynamic integration of audiovisual emotional speech and provide insights on how these reactions are triggered during stimulus perception.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
(a) Examples of the visual smile transformation. Left: decreased smile-transformation example; Centre: increased smile-transformation example; Right: pixel to pixel Boolean difference between increased and decreased smile conditions, where it can be seen the pixels that have the same (black) or different (white) pixel values for both increased and decreased smile conditions. Note that the differences between conditions are located inside the mouth area (b) Change in mean of first formant (left) and second formant (right) frequencies of auditory stimuli grouped by audio transformation (increased or decreased smiles). Stimuli transformed with the increased smiles effect has higher formant frequencies than stimuli transformed with the decreased smile effect; Formants were mean-normalised by the formants of the non-manipulated stimulus. Error bars are 95% confidence intervals on the mean; *Statistically significant differences between distributions (paired t-tests, p < 0.05).
Figure 2
Figure 2
Gaze results. (a) Mean number of fixations for each AOI, and for each task. (b) Mean number of fixations for each task, for each AOI and for both congruent and incongruent conditions; error bars are 95% Confidence Intervals on the mean; Asterisks indicate statistically significant differences (p < Bonferroni-α = 0.0125), “.”: indicate marginally significant differences (p < 0.05).
Figure 3
Figure 3
Pupil results (a) Mean pupil size time series for both the emotion task (left) and the passive task (right), for both congruent (blue) and incongruent (orange) conditions; Shaded areas represent SEM; (b) Mean pupil size for both the emotion and the passive task for both congruent (blue) and incongruent (orange) conditions and for each AOI (c) Mean pupil dilation over time for both congruent and incongruent conditions, for each AOI (d) Mean pupil size after the first fixation to each AOI for both congruent and incongruent conditions; error bars are 95% confidence intervals on the mean; ‘*’: statistically significant differences between distributions (p < 0.05); ‘.’: marginally significant differences between distributions (p < 0.1).

References

    1. Kreifelts B, Ethofer T, Grodd W, Erb M, Wildgruber D. Audiovisual integration of emotional signals in voice and face: An event-related fMRI study. Neuroimage. 2007;37:1445–1456. doi: 10.1016/j.neuroimage.2007.06.020. - DOI - PubMed
    1. Collignon O, et al. Audio-visual integration of emotion expression. Brain Res. 2008;1242:126–135. doi: 10.1016/j.brainres.2008.04.023. - DOI - PubMed
    1. Paulmann S, Pell MD. Is there an advantage for recognizing multi-modal emotional stimuli? Motiv. Emot. 2011;35:192–201. doi: 10.1007/s11031-011-9206-0. - DOI
    1. Baart M, Vroomen J. Recalibration of vocal affect by a dynamic face. Exp. Brain Res. 2018;236:1911–1918. doi: 10.1007/s00221-018-5270-y. - DOI - PMC - PubMed
    1. Föcker J, Gondan M, Röder B. Preattentive processing of audio-visual emotional signals. Acta Psychol. (Amst) 2011;137:36–47. doi: 10.1016/j.actpsy.2011.02.004. - DOI - PubMed

Publication types