The why, what, where, when and how of goal-directed choice: neuronal and computational principles

Paul F M J Verschure¹, Cyriel M A Pennartz², Giovanni Pezzulo³

Affiliations

¹ Laboratory of Synthetic, Perceptive, Emotive and Cognitive Systems (SPECS), Center of Autonomous Systems and Neurorobotics, Universitat Pompeu Fabra (UPF), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain paul.verschure@upf.edu.
² Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands.
³ Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy.

PMID: 25267825
PMCID: PMC4186236
DOI: 10.1098/rstb.2013.0483

Review

The why, what, where, when and how of goal-directed choice: neuronal and computational principles

Paul F M J Verschure et al. Philos Trans R Soc Lond B Biol Sci. 2014.

. 2014 Nov 5;369(1655):20130483.

doi: 10.1098/rstb.2013.0483.

Authors

Paul F M J Verschure¹, Cyriel M A Pennartz², Giovanni Pezzulo³

Affiliations

¹ Laboratory of Synthetic, Perceptive, Emotive and Cognitive Systems (SPECS), Center of Autonomous Systems and Neurorobotics, Universitat Pompeu Fabra (UPF), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain paul.verschure@upf.edu.
² Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands.
³ Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy.

PMID: 25267825
PMCID: PMC4186236
DOI: 10.1098/rstb.2013.0483

Abstract

The central problems that goal-directed animals must solve are: 'What do I need and Why, Where and When can this be obtained, and How do I get it?' or the H4W problem. Here, we elucidate the principles underlying the neuronal solutions to H4W using a combination of neurobiological and neurorobotic approaches. First, we analyse H4W from a system-level perspective by mapping its objectives onto the Distributed Adaptive Control embodied cognitive architecture which sees the generation of adaptive action in the real world as the primary task of the brain rather than optimally solving abstract problems. We next map this functional decomposition to the architecture of the rodent brain to test its consistency. Following this approach, we propose that the mammalian brain solves the H4W problem on the basis of multiple kinds of outcome predictions, integrating central representations of needs and drives (e.g. hypothalamus), valence (e.g. amygdala), world, self and task state spaces (e.g. neocortex, hippocampus and prefrontal cortex, respectively) combined with multi-modal selection (e.g. basal ganglia). In our analysis, goal-directed behaviour results from a well-structured architecture in which goals are bootstrapped on the basis of predefined needs, valence and multiple learning, memory and planning mechanisms rather than being generated by a singular computation.

Keywords: computational modelling; decision-making; distributed adaptive control; embodied cognition; goal-directed behaviour; reward.

PubMed Disclaimer

Figures

**Figure 1.**
The DAC theory of mind and brain (see [10] for a review). Left: highly abstract representation of the DAC architecture. DAC proposes that the brain is organized as a three-layered control structure with tight coupling within and between these layers distinguishing: the soma (SL) and the reactive (RL), adaptive (AL) and contextual (CL) layers. Across these layers, a columnar organization exists that deals with the processing of states of the World or exteroception (left, red), the self or interoception (middle, blue) and action (right, green). See text for further explanation. The reactive layer: the RL comprises dedicated behaviour systems (BS) that combine predefined sensorimotor mappings with drive reduction mechanisms that are predicated on the needs of the body (SL). Right lower panel: each BS follows homeostatic principles supporting the self-essential functions (SEF) of the body (SL). In order to map needs into behaviours, the strength of the essential variables served by the BSs, SEFs, have a specific distribution in task-space called an ‘affordance gradient’. In this example, we consider the (internally represented) ‘attractive force’ of the home position supporting the security SEF or of open space defining the exploration SEF. The values of the respective SEFs are defined by the difference between the sensed value of the affordance gradient (red) and its desired value given the prevailing needs (blue). The regulator of each BS defines the next action as to perform a gradient ascent on the SEF. An integration and action selection process across the different BSs forces a strict winner-take-all decision that defines the specific behaviour emitted. The allostatic controller of the RL regulates the internal homeostatic dynamic of the BSs to set priorities defined by needs and environmental opportunities through the modulation of the affordance gradients, desired values of SEFs and/or the integration process. The adaptive layer: the AL acquires a state space of the agent–environment interaction and shapes action. The learning dynamic of AL is constrained by the SEFs of the RL that define value. The AL crucially contributes to exosensing by allowing the processing of states of distal sensors, e.g. vision and audition, which are not predefined but rather are tuned in somatic time to properties of the interaction with the environment. Acquired sensor and motor states are in turn associated through the valence states signalled by the RL. The contextual layer: the core processes of the CL are divided between a task-model and a self-model. The CL expands the time horizon in which the agent can operate through the use of sequential short-term and long-term memory (STM and LTM) systems respectively. These memory systems operate on integrated sensorimotor representations that are generated by the AL and acquire, retain and express goal-oriented action regulated by the RL. The CL comprises a number of processes (right upper panel): (a) when the discrepancy between predicted and encountered sensory states falls below a STM acquisition threshold, the perceptual predictions (red circle) and motor activity (green rectangle) generated by AL are stored in STM as a, so-called, *segment*. The STM acquisition threshold is defined by the time-averaged reconstruction error of the perceptual learning system of AL. (b) If a goal state (blue flag) is reached, e.g. reward or punishment, the content of STM is retained in LTM as a sequence conserving its order, goal state and valence marker, e.g. aversive or appetitive, and STM is reset. Every sequence is thus labelled with respect to the specific goal it pertains to and its valence marker. (c) If the outputs generated by the RL and AL to action selection are sub-threshold, the AL perceptual predictions are matched against those stored in LTM. (d) The CL selected action is defined as a weighted sum over the segments of LTM. (e) The contribution of LTM segments to decision-making depends on four factors: perceptual evidence, memory chaining, the distance to the goal state and valence. Working memory (WM) of the CL is defined by the memory dynamics that represents these factors. Active segments that contributed to the selected action are associated with those that were previously active establishing rules for future chaining. The self-model component of the CL monitors task performance and develops (re)descriptions of task dynamics anchored in the self. In this way, the system generates meta-representational knowledge that forms autobiographical memory. This aspect of the DAC CL is not further considered in this paper.

**Figure 2.**
Proposed brain architecture representing the neuronal substrate of goal-directed behaviour and its relation to the neurorobotic DAC architecture. The hippocampal formation is proposed to code the organism's world state space (red), the prefrontal cortex (especially its medial and orbitofrontal aspects) to represent task space (i.e. rules, constraints, goals and values of cues and action options, purple) and the striatum (and downstream structures of the basal ganglia) to mediate action selection. In this scheme, the hypothalamus and brain stem contain sensor systems monitoring homeostatic variables and providing information about the motivational needs of the organism that define the pursuit of needs and goals (blue). The arrow from striatum to thalamus represents an indirect projection. The hypothalamic efferents are modelled after those traced for the lateral hypothalamus in relation to feeding behaviour and do not apply to hypothalamic areas in general. For the sake of clarity, the scheme's anatomic connections are by no means complete. For instance, outputs from prefrontal cortex and basal ganglia to the brain stem, or several afferent inputs to amygdala and VTA, have not been included, while several hypothalamic nuclei project directly to ventromedial prefrontal areas. Sensory inputs reach the hippocampus via intermediate stations (parahippocampal areas; not shown) and are supplemented with frontal cortical inputs converging on these intermediate areas. Furthermore, the motor cortices are meant to include premotor, supplementary motor and frontal oculomotor areas (based on [–42]).

See this image and copyright information in PMC

References

1. Tolman EC. 1932. Purposive behavior in animals and man. New York, NY: Century Co.
1. Newell A. 1990. Unified theories of cognition. Cambridge, MA: Harvard University Press.
1. Verschure PFMJ, Althaus P. 2003. A real-world rational agent: unifying old and new AI. Cogn. Sci. 27, 561–590. (10.1207/s15516709cog2704_1) - DOI
1. Levine J. 1983. Materialism and qualia: the explanatory gap. Pac. Phil. Q. 64, 354–361.
1. Chalmers D. 1995. Facing up to the problem of consciousness. J. Conscious. Stud. 2, 200–219.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The why, what, where, when and how of goal-directed choice: neuronal and computational principles

Affiliations

The why, what, where, when and how of goal-directed choice: neuronal and computational principles

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources