Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 30:15:639999.
doi: 10.3389/fnbot.2021.639999. eCollection 2021.

Gazing at Social Interactions Between Foraging and Decision Theory

Affiliations

Gazing at Social Interactions Between Foraging and Decision Theory

Alessandro D'Amelio et al. Front Neurorobot. .

Abstract

Finding the underlying principles of social attention in humans seems to be essential for the design of the interaction between natural and artificial agents. Here, we focus on the computational modeling of gaze dynamics as exhibited by humans when perceiving socially relevant multimodal information. The audio-visual landscape of social interactions is distilled into a number of multimodal patches that convey different social value, and we work under the general frame of foraging as a tradeoff between local patch exploitation and landscape exploration. We show that the spatio-temporal dynamics of gaze shifts can be parsimoniously described by Langevin-type stochastic differential equations triggering a decision equation over time. In particular, value-based patch choice and handling is reduced to a simple multi-alternative perceptual decision making that relies on a race-to-threshold between independent continuous-time perceptual evidence integrators, each integrator being associated with a patch.

Keywords: audio-visual attention; decision theory; drift-diffusion model; gaze models; multimodal perception; perceptual decisions; social interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The handling Editor declared a past collaboration with one of the authors AD'A.

Figures

Figure 1
Figure 1
Overall view of the patch cycle at the basis of the proposed model. (Top left) At any time t the perceiver captures the multimodal landscape of social interactions as a set of audio-visual patches that convey different social value (speakers, faces, gestures, etc.); patches are shown as colored Gaussian blobs that overlay the original video frame. (Bottom left) The simulated 2D spatial random walk (O-U process) is displayed starting from the frame center up to current gaze location within the red patch (speaker's face). (Top right) The decision making dynamics instantiated as the stochastic evolution (1D random walk with drift) of independent racers, one for each patch (patches and racers are coded by corresponding colors); the current patch (red blob) is scrutinized until one of the racers (winner) hits the threshold; the winner sets the next gaze attractor on the corresponding patch; in this case the light blue patch is the winner (non-speaking face); (Bottom right) The simulated gaze trajectory within the new chosen patch after between-patch relocation has been performed. See text for details.
Figure 2
Figure 2
The estimated empirical densities f(score) for the considered models (via Kernel Density estimation). (A) Shows the distributions for the ScanMatch score; (B–F) show the distributions related to the five MultiMatch dimensions.
Figure 3
Figure 3
Critical Difference (CD) diagrams of the post-hoc Nemenyi test (α = 0.05) for the ScanMatch (A) and MultiMatch scores (B–F) when comparing the proposed model with the GazeDeploy procedure, the gold standard and a baseline random model. Diagrams can be read as follows: the difference between two models is significant if the difference in their ranks is larger than the CD. Models that are not significantly different from one another are connected by a black CD line. Friedman's test statistic (t) and p-value (p) are reported in brackets.

References

    1. Admoni H., Scassellati B. (2017). Social eye gaze in human-robot interaction: a review. J. Hum. Robot Interact. 6, 25–63. 10.5898/JHRI.6.1.Admoni - DOI
    1. Aloimonos J., Weiss I., Bandyopadhyay A. (1988). Active vision. Int. J. Comput. Vis. 1, 333–356. 10.1007/BF00133571 - DOI - PubMed
    1. Bajcsy R., Campos M. (1992). Active and exploratory perception. CVGIP Image Understand. 56, 31–40. 10.1016/1049-9660(92)90083-F - DOI
    1. Ballard D. (1991). Animate vision. Artif. Intell. 48, 57–86. 10.1016/0004-3702(91)90080-4 - DOI
    1. Bartumeus F., Catalan J. (2009). Optimal search behavior and classic foraging theory. J. Phys. A Math. Theor. 42:434002. 10.1088/1751-8113/42/43/434002 - DOI