Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May;6(1):2.
doi: 10.1145/2882970.

See You See Me: the Role of Eye Contact in Multimodal Human-Robot Interaction

Affiliations

See You See Me: the Role of Eye Contact in Multimodal Human-Robot Interaction

Tian Linger Xu et al. ACM Trans Interact Intell Syst. 2016 May.

Abstract

We focus on a fundamental looking behavior in human-robot interactions - gazing at each other's face. Eye contact and mutual gaze between two social partners are critical in smooth human-human interactions. Therefore, investigating at what moments and in what ways a robot should look at a human user's face as a response to the human's gaze behavior is an important topic. Toward this goal, we developed a gaze-contingent human-robot interaction system, which relied on momentary gaze behaviors from a human user to control an interacting robot in real time. Using this system, we conducted an experiment in which human participants interacted with the robot in a joint attention task. In the experiment, we systematically manipulated the robot's gaze toward the human partner's face in real time and then analyzed the human's gaze behavior as a response to the robot's gaze behavior. We found that more face looks from the robot led to more look-backs (to the robot's face) from human participants and consequently created more mutual gaze and eye contact between the two. Moreover, participants demonstrated more coordinated and synchronized multimodal behaviors between speech and gaze when more eye contact was successfully established and maintained.

Keywords: Gaze-Based Interaction; Human-Robot Interaction; Multimodal Interface.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An overview of real-time human-robot interaction. Left: A real-time human attention recognition system based on processing first-person view video and human gaze data. The system detected the attentional target from a human user moment by moment and passed such information to the robot control system. The robot then detected the same target from the robot’s view and turned his head directly toward that target. Right: we recorded and analyzed multimodal data, including video, audio, and eye and head movement data from both the human and robot sides.
Figure 2
Figure 2
The complete procedure of object segmentation and detection on both the participant’s (a) and the robot’s (b) side. More details concerning each step will be explained in Appendix.
Figure 3
Figure 3
The detailed time-line of human attention detection and robot gaze-contingent action control. Only after more than 50% of data points indicated the same Region-Of-Interest (in this example, the red object), the control system would conclude this ROI to be the new target and execute the control command to follow the current target attended by the human participant.
Figure 4
Figure 4
Four joint attentional states in the interaction from the participant’s first-person view (human gaze indicated by an orange cross hair): (a) the human was looking at a target object while the robot was looking at the human’s face; (b)mutual gaze: both the human and the robot looked at each other’s face; (c)both the robot and the human jointly attended to the same object; and (d) the robot was attending to an object in the human’s hands while the human gazed at the robot’s face.
Figure 5
Figure 5
Examples of the robot’s and the human’s gaze data streams from three experimental conditions.(a) Responsive looks: the robot ignored brief face looks from the human and exactly copied human gaze behaviors with a short delay. (b) Extended responsive looks: the robot responded to the human’s looks by looking back at the human’s face and the robot continued looking at the human’s face for another 1.5 seconds even after the human looked away. Humans may or may not generate a second face look to respond. (c) Responsive & eliciting looks: the robot not only followed the human’s face looks as what it did in the other two conditions but it also attempted to initiate eye contact by looking at the human’s face when the human’s attention was not on the robot’s face.
Figure 6
Figure 6
Different measures of the human participants’ gaze behaviors across three conditions.(a) Proportion of face looking time across the three experimental conditions. (b) Gaze duration on the robot’s face. (c) Number of face looks per minute.
Figure 7
Figure 7
The proportions of mutual gaze time in which both the robot and the participant looked at each other’s face.
Figure 8
Figure 8
Given multiple temporal instances, the algorithm computes the most probable sequential prototype based on comparing and matching individual instances.
Figure 9
Figure 9
Four speech-gaze temporal patterns: object/naming – looking at the target object while naming it, object/describing –looking at the target object while verbally describing it, face/naming – looking at the robot learner’s face while naming an object, and face/describing – looking at the robot’s face while describing an object. Note that these patterns contain the information about not only the temporal order of these multimodal events but also the timings and durations between and within these events.
Figure 10
Figure 10
A comparison of the proportions of instances that participants in three experimental conditions exhibited the four coordinated sequential patterns. Compared with those in the responsive condition, participants in both extended responsive and responsive&eliciting conditions generated more synchronized gaze-speech patterns toward the target object when naming and describing that object. In all of the three conditions, they attended to and checked the robot’s face during naming and describing events with no difference among the three conditions.

References

    1. ADMONI HENNY, DATSIKAS CHRISTOPHER, SCASSELLATI BRIAN. Speech and Gaze Conflicts in Collaborative Human-Robot Interactions; Proceedings of the 36th Annual Conference of the Cognitive Science Society (CogSci 2014); 2014.
    1. ADMONI HENNY, HAYES BRADLEY, FEIL-SEIFER DAVID, ULLMAN DANIEL, SCASSELLATI BRIAN. Are you looking at me?: perception of robot attention is mediated by gaze type and group size; Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction; IEEE Press. 2013.pp. 389–396.
    1. ANDRIST SEAN, TAN XIANGZHI,, GLEICHER MICHAEL, MUTLU BILGE. Conversational gaze aversion for humanlike robots; Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction; ACM. 2014.pp. 25–32.
    1. ARGYLE M. Bodily communication. Methuen; New York, NY: 1988.
    1. ARGYLE MICHAEL, GRAHAM JEANANN. The central Europe experiment: Looking at persons and looking at objects. Journal of Nonverbal Behavior. 1976;1:6–16.

LinkOut - more resources