Toward an Attentive Robotic Architecture: Learning-Based Mutual Gaze Estimation in Human-Robot Interaction

Maria Lombardi¹, Elisa Maiettini¹, Davide De Tommaso², Agnieszka Wykowska², Lorenzo Natale¹

Affiliations

¹ Humanoid Sensing and Perception, Istituto Italiano di Tecnologia, Genova, Italy.
² Social Cognition in Human-Robot Interaction, Istituto Italiano di Tecnologia, Genova, Italy.

PMID: 35321344
PMCID: PMC8935014
DOI: 10.3389/frobt.2022.770165

Toward an Attentive Robotic Architecture: Learning-Based Mutual Gaze Estimation in Human-Robot Interaction

Maria Lombardi et al. Front Robot AI. 2022.

. 2022 Mar 7:9:770165.

doi: 10.3389/frobt.2022.770165. eCollection 2022.

Authors

Maria Lombardi¹, Elisa Maiettini¹, Davide De Tommaso², Agnieszka Wykowska², Lorenzo Natale¹

Affiliations

¹ Humanoid Sensing and Perception, Istituto Italiano di Tecnologia, Genova, Italy.
² Social Cognition in Human-Robot Interaction, Istituto Italiano di Tecnologia, Genova, Italy.

PMID: 35321344
PMCID: PMC8935014
DOI: 10.3389/frobt.2022.770165

Abstract

Social robotics is an emerging field that is expected to grow rapidly in the near future. In fact, it is increasingly more frequent to have robots that operate in close proximity with humans or even collaborate with them in joint tasks. In this context, the investigation of how to endow a humanoid robot with social behavioral skills typical of human-human interactions is still an open problem. Among the countless social cues needed to establish a natural social attunement, this article reports our research toward the implementation of a mechanism for estimating the gaze direction, focusing in particular on mutual gaze as a fundamental social cue in face-to-face interactions. We propose a learning-based framework to automatically detect eye contact events in online interactions with human partners. The proposed solution achieved high performance both in silico and in experimental scenarios. Our work is expected to be the first step toward an attentive architecture able to endorse scenarios in which the robots are perceived as social partners.

Keywords: attentive architecture; computer vision; experimental psychology; humanoid robot; human–robot interaction; joint attention; mutual gaze.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Dataset collection. **(A)** Overall setup. The participant was seated at a desk in front of iCub. The latter was mounted with a RealSense camera on its head. **(B)** Sample frames were recorded using both iCub’s camera (first row) and the RealSense camera (second row). Different frames capture different human positions (rotation of the torso/head) and conditions (eye contact and no eye contact).

**FIGURE 2**
Learning architecture. The acquired image is first used as input for OpenPose in order to get the facial keypoints and build the feature vector for the individual in the scene. Then, such a feature vector goes in as input to the mutual gaze classifier whose output is the pair (r, c), where r is the binary result of the classification (eye contact/no eye contact) and c is the confidence level.

**FIGURE 3**
Feature importance. **(A)** Bar plot reporting on the x-axis the SHAP feature importance in percentage measured as the mean absolute Shapley value. Only the first 20 most important features are reported on the y-axis. **(B)** Numbered face keypoints of the feature vector.

**FIGURE 4**
Experimental setup. **(A)** The iCub is positioned between two lateral screens face to face with the participant at the opposite sides of a desk that is 125 cm wide. **(B)** Sample frames acquired during the experiment in which the participant first looks at the robot to make an eye contact and then simulates a distraction looking at the lateral screen. On each frame, the prediction (eye contact yes/no) with the confidence value c is also reported.

See this image and copyright information in PMC

References

1. Boucher J.-D., Pattacini U., Lelong A., Bailly G., Elisei F., Fagel S., et al. (2012). I Reach Faster when I See You Look: Gaze Effects in Human-Human and Human-Robot Face-To-Face Cooperation. Front. neurorobotics 6, 3. 10.3389/fnbot.2012.00003 - DOI - PMC - PubMed
1. Cao Z., Hidalgo G., Simon T., Wei S. E., Sheikh Y. (2019). Openpose: Realtime Multi-Person 2d Pose Estimation Using Part Affinity fields. IEEE Trans. Pattern Anal. Mach Intell. 43, 172–186. 10.1109/TPAMI.2019.2929257 - DOI - PubMed
1. Chong E., Clark-Whitney E., Southerland A., Stubbs E., Miller C., Ajodan E. L., et al. (2020). Detection of Eye Contact with Deep Neural Networks Is as Accurate as Human Experts. Nat. Commun. 11, 1–10. 10.1038/s41467-020-19712-x - DOI - PMC - PubMed
1. Coelho E., George N., Conty L., Hugueville L., Tijus C. (2006). Searching for Asymmetries in the Detection of Gaze Contact Versus Averted Gaze under Different Head Views: A Behavioural Study. Spat. Vis 19, 529–545. 10.1163/156856806779194026 - DOI - PubMed
1. Dalmaso M., Castelli L., Galfano G. (2017a). Attention Holding Elicited by Direct-Gaze Faces Is Reflected in Saccadic Peak Velocity. Exp. Brain Res. 235, 3319–3332. 10.1007/s00221-017-5059-4 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Toward an Attentive Robotic Architecture: Learning-Based Mutual Gaze Estimation in Human-Robot Interaction

Affiliations

Toward an Attentive Robotic Architecture: Learning-Based Mutual Gaze Estimation in Human-Robot Interaction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources