Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 4;8(1):10110.
doi: 10.1038/s41598-018-28241-z.

Neural signatures of reinforcement learning correlate with strategy adoption during spatial navigation

Affiliations

Neural signatures of reinforcement learning correlate with strategy adoption during spatial navigation

Dian Anggraini et al. Sci Rep. .

Abstract

Human navigation is generally believed to rely on two types of strategy adoption, route-based and map-based strategies. Both types of navigation require making spatial decisions along the traversed way although formal computational and neural links between navigational strategies and mechanisms of value-based decision making have so far been underexplored in humans. Here we employed functional magnetic resonance imaging (fMRI) while subjects located different objects in a virtual environment. We then modelled their paths using reinforcement learning (RL) algorithms, which successfully explained decision behavior and its neural correlates. Our results show that subjects used a mixture of route and map-based navigation and their paths could be well explained by the model-free and model-based RL algorithms. Furthermore, the value signals of model-free choices during route-based navigation modulated the BOLD signals in the ventro-medial prefrontal cortex (vmPFC), whereas the BOLD signals in parahippocampal and hippocampal regions pertained to model-based value signals during map-based navigation. Our findings suggest that the brain might share computational mechanisms and neural substrates for navigation and value-based decisions such that model-free choice guides route-based navigation and model-based choice directs map-based navigation. These findings open new avenues for computational modelling of wayfinding by directing attention to value-based decision, differing from common direction and distances approaches.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Wayfinding task and behavioral results. (A) Layout of the grid-world. The Virtual Reality (VR) environment consisted of a 5 by 5 grid of rooms. Each square represents a room which contained distinct furniture and objects to distinguish individual rooms. Black square represents starting position, colored squares reward locations and the number represents the order in which they need to be found. The wayfinding task consisted of three phases: encoding, retrieval, and search phase. During the search phase, subject had to locate one randomly chosen reward at each trial, each time starting from a different starting position. (B) Screenshots of the virtual reality environment. Each room is furnished with distinct objects to allow subjects to distinguish and recognize individual rooms. At each room (decision point) subjects could choose up to three directions (corner rooms had either one or two directions to choose). After a choice was made, an animation was leading to the room in the selected direction; this movement lasted 2.5–3 seconds jittered uniformly. The next room and, if applicable, the reward were presented. (C) Path from a representative participant (subject no. 1) who exhibited a tendency towards route-based strategy. During the encoding phase, the subject established by repetition a fixed route from one reward to the other. During the search phase, the subject still followed the established route to reach the reward when it started from a location on that previous route. However, when the subject started from a position which was not part of the original route, she could locate the reward room using the shortest possible path.
Figure 2
Figure 2
Reinforcement learning models and model fits. The top panel displays action values, showing how valuable it is to move along the route in a certain state; the bottom panel shows the probability of taking certain actions in those state based on the action values. Black numbers are state values, blue numbers are probabilities of chosen action, green values refer to probabilities of other not chosen actions. Note that not all probabilities for non-preferred actions are shown. (A) Model free valuation based on the SARSA (λ) algorithm. After reaching a reward this algorithm updates the values only along the traversed path. (B) Model-based valuations derived from dynamic programming. The model-based algorithm updates values not only along the taken path, but across the entire grid world.
Figure 3
Figure 3
Correlation between navigation indices and ω parameters (n = 27). (A) Significant positive correlation between IPATH and ω parameters. (B) Significant positive correlation between ISTEP and ω parameters. (C) Significant negative correlation between IROUTE and ω parameters for the fMRI experiment. IPATH, ISTEP, IROUTE, and ω are averaged values for individual subject across three different phases of the wayfinding task.
Figure 4
Figure 4
Correlations of model predicted values with BOLD signals. (A) Correlates of model-free valuations in medial/vmPFC, striatum, and retrosplenial cortex. (B) Correlates of model-based valuation in parahippocampal and medial temporal lobe region as well as the left retrosplenial cortex. Displayed results are significant at P < 0.05 whole brain FWE corrected at the cluster level.
Figure 5
Figure 5
Correlation between subjects’ relative degree of model-based behavior (ω) and β-parameter estimates of the model-free parametric regressor. (A) In vmPFC, we found a significant negative correlation between the ω parameter in the behavioral hybrid model and β-estimates in the GLM from the parametric regressor of model-free values. That means, across subjects, the larger the relative degree of model-free choice behavior of a subject, the stronger was her representation of model-free values in the BOLD signal in vmPFC. No such relationship was found for the model-based value regressor. (B) In the parahipocampal gyrus, we found a significant positive correlation between ω and β- estimates from the parametric regressor of model-free values. The larger the relative degree of model-based choice behavior in a subject, the bigger was her representation of model-based values in the BOLD signal in the parahipocampal gyrus. No such correlation was found for the model-free value regressor.

References

    1. Iaria G, Petrides M, Dagher A, Pike B, Bohbot VD. Cognitive strategies dependent on the hippocampus and caudate nucleus in human navigation: variability and change with practice. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2003;23:5945–5952. doi: 10.1523/JNEUROSCI.23-13-05945.2003. - DOI - PMC - PubMed
    1. Igloi K, Zaoui M, Berthoz A, Rondi-Reig L. Sequential egocentric strategy is acquired as early as allocentric strategy: Parallel acquisition of these two navigation strategies. Hippocampus. 2009;19:1199–1211. doi: 10.1002/hipo.20595. - DOI - PubMed
    1. Wolbers T, Hegarty M. What determines our navigational abilities? Trends in Cognitive Sciences. 2010;14:138–146. doi: 10.1016/j.tics.2010.01.001. - DOI - PubMed
    1. Bohbot VD, Iaria G, Petrides M. Hippocampal function and spatial memory: evidence from functional neuroimaging in healthy participants and performance of patients with medial temporal lobe resections. Neuropsychology. 2004;18:418–425. doi: 10.1037/0894-4105.18.3.418. - DOI - PubMed
    1. Latini-Corazzini L, et al. Route and survey processing of topographical memory during navigation. Psychol Res. 2010;74:545–559. doi: 10.1007/s00426-010-0276-5. - DOI - PubMed

Publication types

LinkOut - more resources