Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 7:9:770165.
doi: 10.3389/frobt.2022.770165. eCollection 2022.

Toward an Attentive Robotic Architecture: Learning-Based Mutual Gaze Estimation in Human-Robot Interaction

Affiliations

Toward an Attentive Robotic Architecture: Learning-Based Mutual Gaze Estimation in Human-Robot Interaction

Maria Lombardi et al. Front Robot AI. .

Abstract

Social robotics is an emerging field that is expected to grow rapidly in the near future. In fact, it is increasingly more frequent to have robots that operate in close proximity with humans or even collaborate with them in joint tasks. In this context, the investigation of how to endow a humanoid robot with social behavioral skills typical of human-human interactions is still an open problem. Among the countless social cues needed to establish a natural social attunement, this article reports our research toward the implementation of a mechanism for estimating the gaze direction, focusing in particular on mutual gaze as a fundamental social cue in face-to-face interactions. We propose a learning-based framework to automatically detect eye contact events in online interactions with human partners. The proposed solution achieved high performance both in silico and in experimental scenarios. Our work is expected to be the first step toward an attentive architecture able to endorse scenarios in which the robots are perceived as social partners.

Keywords: attentive architecture; computer vision; experimental psychology; humanoid robot; human–robot interaction; joint attention; mutual gaze.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Dataset collection. (A) Overall setup. The participant was seated at a desk in front of iCub. The latter was mounted with a RealSense camera on its head. (B) Sample frames were recorded using both iCub’s camera (first row) and the RealSense camera (second row). Different frames capture different human positions (rotation of the torso/head) and conditions (eye contact and no eye contact).
FIGURE 2
FIGURE 2
Learning architecture. The acquired image is first used as input for OpenPose in order to get the facial keypoints and build the feature vector for the individual in the scene. Then, such a feature vector goes in as input to the mutual gaze classifier whose output is the pair (r, c), where r is the binary result of the classification (eye contact/no eye contact) and c is the confidence level.
FIGURE 3
FIGURE 3
Feature importance. (A) Bar plot reporting on the x-axis the SHAP feature importance in percentage measured as the mean absolute Shapley value. Only the first 20 most important features are reported on the y-axis. (B) Numbered face keypoints of the feature vector.
FIGURE 4
FIGURE 4
Experimental setup. (A) The iCub is positioned between two lateral screens face to face with the participant at the opposite sides of a desk that is 125 cm wide. (B) Sample frames acquired during the experiment in which the participant first looks at the robot to make an eye contact and then simulates a distraction looking at the lateral screen. On each frame, the prediction (eye contact yes/no) with the confidence value c is also reported.

References

    1. Boucher J.-D., Pattacini U., Lelong A., Bailly G., Elisei F., Fagel S., et al. (2012). I Reach Faster when I See You Look: Gaze Effects in Human-Human and Human-Robot Face-To-Face Cooperation. Front. neurorobotics 6, 3. 10.3389/fnbot.2012.00003 - DOI - PMC - PubMed
    1. Cao Z., Hidalgo G., Simon T., Wei S. E., Sheikh Y. (2019). Openpose: Realtime Multi-Person 2d Pose Estimation Using Part Affinity fields. IEEE Trans. Pattern Anal. Mach Intell. 43, 172–186. 10.1109/TPAMI.2019.2929257 - DOI - PubMed
    1. Chong E., Clark-Whitney E., Southerland A., Stubbs E., Miller C., Ajodan E. L., et al. (2020). Detection of Eye Contact with Deep Neural Networks Is as Accurate as Human Experts. Nat. Commun. 11, 1–10. 10.1038/s41467-020-19712-x - DOI - PMC - PubMed
    1. Coelho E., George N., Conty L., Hugueville L., Tijus C. (2006). Searching for Asymmetries in the Detection of Gaze Contact Versus Averted Gaze under Different Head Views: A Behavioural Study. Spat. Vis 19, 529–545. 10.1163/156856806779194026 - DOI - PubMed
    1. Dalmaso M., Castelli L., Galfano G. (2017a). Attention Holding Elicited by Direct-Gaze Faces Is Reflected in Saccadic Peak Velocity. Exp. Brain Res. 235, 3319–3332. 10.1007/s00221-017-5059-4 - DOI - PubMed

LinkOut - more resources