Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 22:4:771.
doi: 10.3389/fpsyg.2013.00771. eCollection 2013.

Neural model for learning-to-learn of novel task sets in the motor domain

Affiliations

Neural model for learning-to-learn of novel task sets in the motor domain

Alexandre Pitti et al. Front Psychol. .

Abstract

During development, infants learn to differentiate their motor behaviors relative to various contexts by exploring and identifying the correct structures of causes and effects that they can perform; these structures of actions are called task sets or internal models. The ability to detect the structure of new actions, to learn them and to select on the fly the proper one given the current task set is one great leap in infants cognition. This behavior is an important component of the child's ability of learning-to-learn, a mechanism akin to the one of intrinsic motivation that is argued to drive cognitive development. Accordingly, we propose to model a dual system based on (1) the learning of new task sets and on (2) their evaluation relative to their uncertainty and prediction error. The architecture is designed as a two-level-based neural system for context-dependent behavior (the first system) and task exploration and exploitation (the second system). In our model, the task sets are learned separately by reinforcement learning in the first network after their evaluation and selection in the second one. We perform two different experimental setups to show the sensorimotor mapping and switching between tasks, a first one in a neural simulation for modeling cognitive tasks and a second one with an arm-robot for motor task learning and switching. We show that the interplay of several intrinsic mechanisms drive the rapid formation of the neural populations with respect to novel task sets.

Keywords: cortical plasticity; decision making; error-reward processing; fronto-parietal system; gain-field mechanism; incremental learning; task sets; tool-use.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Framework for task set selection. The whole system is composed of three distinct neural networks, inspired from Khamassi et al. (2011). The PPC network conforms to an associative network. It binds the afferent sensory inputs from each other and map them to different motor outputs with respect to a task set. The ACC system is a error-based working memory that processes the incoming PPC signals and feeds back an error to them with respect to current task. This modulated signal is used to tune the population of neurons in PPC by reinforcement learning, it is also conveyed to the PFC map, which is a recurrent network that learns dynamically the spatio-temporal patterns of the ongoing episodes with respect to the task.
Figure 2
Figure 2
Task sets mapping, the mechanism of gain-fields. (A) Gain-fields neurons are units used for sensorimotor transformation. They transform the input activity into another base, which is then fed forward to various outputs with respect to their task. Gain-fields can be seen as meta-parameters that decrease the complexity of the sensory-motor problem into a linear one. (B) example of GF neurons sensorimotor transformation for two modalities projecting to three different task sets; each GF neuron contributes to one particular feature of the tasks (Pouget and Snyder, ; Orban and Wolpert, 2011).
Figure 3
Figure 3
Rank-Order Coding principle (Thorpe et al., 2001). This type of neuron encodes the rank code of an input signal. Its amplitude is translated into an ordered sequence and the neuron's synaptic weights are associated to this sequence. The neural activity is salient to this particular order only, see (A), and otherwise not, see (B).
Figure 4
Figure 4
Protocol setup in task sets learning. This simple protocol explains how the experimental setup is done for acquiring different contexts incrementally and for selecting them.
Figure 5
Figure 5
Raster plot of the PPC output map and plasticity vs. stability within the map. (A) the graph displays the neural dynamics during task switch among four different contexts. (B) Convergence rate of the PPC network with respect to each task. (C) The degree of plasticity and stability within the PPC output map is represented as the probability distribution of the neurons membership to the cluster relative to a context. This histogram shows two behaviors within the system. On the one hand, one third of the neurons present very stable dynamics with membership to one context only. On the other hand, two third of the neurons are part of different clusters and therefore to different contexts. The later neurons follow a power law distribution showing very plastic dynamics.
Figure 6
Figure 6
Cluster dynamics at the time to switch. (A) Neural dynamics of the active clusters before and after the switch; resp. in blue and in red. (B) Histogram of the neural population at the time to switch with respect to the active clusters before and after the switch.
Figure 7
Figure 7
Experiment on two-choices decision making and task switching. (A) Neural dynamics of PPC neurons and ACC error system during task switch. We plot in the chart in the top the temporal interval for each task. Below the, neural dynamics of the PPC maps and in the middle, its erronous activity retranscribed in the ACC system. ACC works as a working memory that keep tracks of the erronous outputs, which is used during the learning stage. ACC is reset each time the PPC system gives a correct answer. Through reinforcement learning, the PPC maps converge gradually to the correct probability distribution. (B) Snapshot of the PPC maps in blue modulated negatively by ACC in red.
Figure 8
Figure 8
Confidence Level of PPC maps during task switch, dynamics and histogram. (A) The confidence level is the difference between the amplitude of most activated neuron and the second one within each map. After one thousand iterations, the two maps rapidly specialize their dynamics to its associated task. This behavior is due to the ACC error-based learning. (B) histogram of the probability distribution of the confidence level with and without ACC. With ACC, we observe a clear separation in two distributions, which correspond to a decrease of uncertainty with respect to the task. In comparison, the confidence level in an associative network without an error feedback gives a uniform distribution.
Figure 9
Figure 9
Raster plot for PFC neurons. In (A), the PFC learns the particular temporal sequence from PPC outputs and it is sensitive to the temporal order of each unit in the sequence. In (B) on the top chart, the confidence level on the incoming signals from the PPC is plotted. The chart in the middle displays the neural activity for two neurons from the two distinct clusters. The neuron #10 in black (resp. cluster #1) and the neuron #14 in red (resp. cluster #2). The raster plot of the whole system is plotted in the chart below.
Figure 10
Figure 10
PFC vs. PPC dynamics. (A) The snapshot of the PPC/PFC dynamics at time t = 1653 show conflicting choices between the two maps, which correspond to a bifurcation point. After temporal integration, the PFC is processing the decision-making of a winner neuron different from the PPC choice. (B) Three PPC/PFC interactions occur, when PPC overwrites the values of PFC units, when PFC elicites its own values with respect to PPC and when both agree on the current predict. The PPC-PFC system works mostly in coherence from each other for 60% of the time (green bar) but in situations of conflict, the PPC overwrites twice the dynamics of the PFC network (blue bar) than the reverse (red bar).
Figure 11
Figure 11
PFC neuron's integration at time t = 604 and t = 2400. (A) Depending on the current situation, a neuron will be more selective to one part of the sequence or to another. The earlier a sequence is detected, the farther the prediction of the trajectory. (B) At bifurcation points, the trajectories are fuzzier and several patterns are elicited.
Figure 12
Figure 12
PFC network analysis. (A) Connectivity circle for the neurons of the PFC map. In blue are displayed the neurons belonging to cluster 1 and in red are displayed the neurons belonging to cluster 2. The number of links within each cluster (intra-map connectivity) is higher than the number of links between them (inter-map connectivity). Moreover, the number of highly connected neurons is also weak. these charateristic replicate the ones of complex systems and of small-world networks in particular. (B) Task switch is done through these hub-like neurons which can direct the trajectory from one or the other task. (C) The connectivity level per neurons within the network follows a logarithmic curve typical of complex networks, where the mostly connected neurons are also the fewer and the most critical with 4 distant connections. (D) The PFC network contributes to enhance the decision-making process in comparison to the PPC-ACC system due to the learning of the temporal sequence and to its better organization.
Figure 13
Figure 13
Robot arm Kinova for task-set selection. The two task-sets correspond to (A) the situation when it is moving its hand alone with the red target on its hand and (B) the situation when it is moving the stick on its hand with the red target on the tip of the tool.
Figure 14
Figure 14
Dynamics of the gain-field neurons relative to the task. (A–D) In blue, the robot moves its hand freely. In red, the robot is handling the tool. Depending on what the GF neurons have learned, their peak level will diminish or increase when changing the task (i.e., using a tool).
Figure 15
Figure 15
PFC Attention decision during contextual change, hand-free or tool-use. (A) we expose to the PFC dynamics some incomplete patterns for a short period of time of 20 iterations, every 500 iterations. The PFC is capable to switch to the reconstruct back the missing part of the spatio-temporal sequence; in blue for hand-free and in red for tol-use. (B) Neural activity for one neuron when one of the two contexts is set.

Similar articles

Cited by

References

    1. Abbott L., Nelson S. (2000). Synaptic plasticity: taming the beast. Nat. Neurosci. 3, 1178–1182 10.1038/81453 - DOI - PubMed
    1. Adolph K. (2008). Learning to move. Curr. Dir. Psychol. Sci. 17, 213–218 10.1111/j.1467-8721.2008.00577.x - DOI - PMC - PubMed
    1. Adolph K., Joh A. (2005). Multiple learning mechanisms in the development of action, in Paper presented to the Conference on Motor Development and Learning. Vol. 33, eds Lockman J., Reiser J., Nelson C. A. (Murcia: ).
    1. Adolph K., Joh A. (2009). Multiple Learning Mechanisms in the Development of Action. New York, NY: Oxford University Press
    1. Andersen R. (1997). Multimodal integration for the representation of space in the posterior parietal cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci. 353, 1421–1428 10.1098/rstb.1997.0128 - DOI - PMC - PubMed

LinkOut - more resources