Modelling the role of task in the control of gaze

Dana H Ballard¹, Mary M Hayhoe

Affiliations

PMID: 20411027
PMCID: PMC2856937
DOI: 10.1080/13506280902978477

Modelling the role of task in the control of gaze

Dana H Ballard et al. Vis cogn. 2009.

. 2009 Aug 1;17(6-7):1185-1204.

doi: 10.1080/13506280902978477.

Authors

Dana H Ballard¹, Mary M Hayhoe

Affiliation

¹ Center for Perceptual Systems, University of Texas, Austin, TX, USA.

PMID: 20411027
PMCID: PMC2856937
DOI: 10.1080/13506280902978477

Abstract

Gaze changes and the resultant fixations that orchestrate the sequential acquisition of information from the visual environment are the central feature of primate vision. How are we to understand their function? For the most part, theories of fixation targets have been image based: The hypothesis being that the eye is drawn to places in the scene that contain discontinuities in image features such as motion, colour, and texture. But are these features the cause of the fixations, or merely the result of fixations that have been planned to serve some visual function? This paper examines the issue and reviews evidence from various image-based and task-based sources. Our conclusion is that the evidence is overwhelmingly in favour of fixation control being essentially task based.

PubMed Disclaimer

Figures

**Figure 1**
Separate visual routines. When subjects have had a preview of a scene they can identify a search target’s location from memory (A) but without the preview they use a correlation-based technique (B) that takes longer. One could attempt convert the remembered target’s location to saliency coordinates, but not without addressing the more complicated question of how the brain manages different dynamic frames of reference.

**Figure 2**
(A) A frame from the human embedded vision system simulation showing the avatar negotiating a pavement strewn with purple litter and blue obstacles, each of which must be dealt with. The insets show the use of vision to guide the avatar through a complex environment. The upper inset shows the particular visual routine that is running at any instant. This instant shows the detection of the edges of the sidewalk that are used in navigation. The lower insert shows the visual field in a head-centred viewing frame. (B) By wearing a Head Mounted Display (HMD), humans can walk in the same environment as the avatar. (C) A basic visually guided behaviour showing steps in the use of the learnt litter cleanup Q-table. The input is a processed colour image with a filled circle on the extreme right-hand side indicating the nearest litter object as a heading angle θ and distance d. This state information, indicated by the circular symbol in the policy table on the lower left, is used to retrieve the appropriate action from the Q-table’s policy immediately below. Light regions: Turn = −45°; grey regions: Turn = 0°; and dark regions: Turn = −45°. In this case the selected action is turn = −45°. The assumption is that neural circuitry translates this abstract heading into complex walking movements. This is true for the human avatar that has a “walk” command that takes a heading parameter. State information can also be used to retrieve the expected return associated with the optimal action, its learned Q-value, as illustrated on the lower right.

**Figure 3**
Behaviours compete for gaze in order to update their measurements. (A) A caricature of the basic method. The trajectory through the avatar’s state space is estimated using a Kalman filter that allows estimates to propagate in the absence of measurements and build up uncertainty (light grey area). If the behaviour succeeds in obtaining a fixation, uncertainty is reduced (dark grey region). The reinforcement learning model allows the value of reducing uncertainty to be calculated. (B) The top panel shows seven time steps in walking and the associated uncertainties for the state vector grey for obstacle avoidance (OA), sidewalk finding (SF), and litter pickup (LC). The corresponding boxes below show the state spaces where the a priori uncertainty is indicated in light grey and the a posteriori uncertainty is indicated in the darker grey. Uncertainty grows because the internal model has noise that adds to uncertainty in the absence of measurements. Making a measurement with a visual routine that uses gaze reduces the uncertainty. For example, for litter collection (LC), Panel 5 shows a large amount of uncertainty has built up that is greatly reduced by a visual measurement. Overall, obstacle avoidance wins the first three competitions, then sidewalk-finding, and then litter collection wins the last three. (C) Tests of the Sprague algorithm (dark) against the robotics standard round robin algorithm (light) and random gaze allocation (white) show a significant advantage over both.

**Figure 4**
Comparing human gaze locations to those found by the Itti saliency detector. (A) Key. The small inserts show the saliency maps that are overlaid as transparencies on the lower versions of the images. (B) Match example. (C) No match example. (D) In a sample of 18 frames, more than half show fixation locations that are not detected by the maps. The saliency program was provided by Dr. Laurent Itti at the University of Southern California. In this case, for a representative sample, only 8 out of 18 frames were labelled as matches.

**Figure 5**
Using the DBN to recognize steps in sandwich making. (A) Two fixations from different points in the task—(top) bread with peanut butter (bottom) peanut butter jar—appear very similar, but do not confuse the Dynamic Bayes Network (DBN), which uses task information. (B) A frame in the video of a human subject in the process of making a sandwich showing that the DBN has correctly identified the subtask as “knife-in-hand”. (C) A trace of the entire sandwich-making process showing perfect subtask recognition by the DBN.

**Figure 6**
The basic structure of the Dynamic Bayes Net (DBN) used to model sandwich making. Two time slices from the sandwich-making DBN. Visual and hand measurements provide input to the shaded nodes, the set of which at any time t comprise the measurement vector *O^t*. The rest of the nodes comprise the set *S^t* whose probabilities must be estimated. The sequencing probabilities between subtasks are provided from a task model that in turn is based on human subject data.

See this image and copyright information in PMC

References

1. Aivar P, Hayhoe M, Mruczek R. Role of spatial memory in saccadic targeting in natural tasks. Journal of Vision. 2005;5:177–193. - PubMed
1. Ballard D, Hayhoe M, Pelz J. Memory representations in natural tasks. Journal of Cognitive Neuroscience. 1995;7:66–80. - PubMed
1. Ballard D, Hayhoe M, Pook P, Rao R. Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences. 1997;20:723–767. - PubMed
1. Bonasso R, Firby R, Kortenkamp D, Miller D, Slack M. Experiences with an architecture for intelligent reactive agents. Journal of Experimental and Theoretical Artificial Intelligence. 1997;9:237–256.
1. Buswell GT. How people look at pictures. Chicago: University of Chicago Press; 1935.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modelling the role of task in the control of gaze

Affiliation

Modelling the role of task in the control of gaze

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources