Interplay of approximate planning strategies

doi:10.1073/pnas.1414219112

. 2015 Mar 10;112(10):3098-103.

doi: 10.1073/pnas.1414219112. Epub 2015 Feb 9.

Interplay of approximate planning strategies

Quentin J M Huys¹, Níall Lally², Paul Faulkner³, Neir Eshel⁴, Erich Seifritz⁵, Samuel J Gershman⁶, Peter Dayan⁷, Jonathan P Roiser⁸

Affiliations

¹ Translational Neuromodeling Unit, Institute of Biomedical Engineering, University of Zürich and Swiss Federal Institute of Technology (ETH) Zürich, 8032 Zurich, Switzerland; Department of Psychiatry, Psychotherapy and Psychosomatics, Hospital of Psychiatry, University of Zürich, 8032 Zurich, Switzerland; qhuys@cantab.net.
² Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, United Kingdom; Experimental Therapeutics & Pathophysiology Branch, Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892;
³ Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA 90095;
⁴ Program in Neuroscience and MD-PhD Program, Harvard Medical School, Boston, MA 02115;
⁵ Department of Psychiatry, Psychotherapy and Psychosomatics, Hospital of Psychiatry, University of Zürich, 8032 Zurich, Switzerland;
⁶ Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139; and.
⁷ Gatsby Computational Neuroscience Unit, University College London, London WC1N 3AR, United Kingdom.
⁸ Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, United Kingdom;

PMID: 25675480
PMCID: PMC4364207
DOI: 10.1073/pnas.1414219112

Interplay of approximate planning strategies

Quentin J M Huys et al. Proc Natl Acad Sci U S A. 2015.

. 2015 Mar 10;112(10):3098-103.

doi: 10.1073/pnas.1414219112. Epub 2015 Feb 9.

Authors

Quentin J M Huys¹, Níall Lally², Paul Faulkner³, Neir Eshel⁴, Erich Seifritz⁵, Samuel J Gershman⁶, Peter Dayan⁷, Jonathan P Roiser⁸

Affiliations

¹ Translational Neuromodeling Unit, Institute of Biomedical Engineering, University of Zürich and Swiss Federal Institute of Technology (ETH) Zürich, 8032 Zurich, Switzerland; Department of Psychiatry, Psychotherapy and Psychosomatics, Hospital of Psychiatry, University of Zürich, 8032 Zurich, Switzerland; qhuys@cantab.net.
² Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, United Kingdom; Experimental Therapeutics & Pathophysiology Branch, Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892;
³ Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA 90095;
⁴ Program in Neuroscience and MD-PhD Program, Harvard Medical School, Boston, MA 02115;
⁵ Department of Psychiatry, Psychotherapy and Psychosomatics, Hospital of Psychiatry, University of Zürich, 8032 Zurich, Switzerland;
⁶ Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139; and.
⁷ Gatsby Computational Neuroscience Unit, University College London, London WC1N 3AR, United Kingdom.
⁸ Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, United Kingdom;

PMID: 25675480
PMCID: PMC4364207
DOI: 10.1073/pnas.1414219112

Abstract

Humans routinely formulate plans in domains so complex that even the most powerful computers are taxed. To do so, they seem to avail themselves of many strategies and heuristics that efficiently simplify, approximate, and hierarchically decompose hard tasks into simpler subtasks. Theoretical and cognitive research has revealed several such strategies; however, little is known about their establishment, interaction, and efficiency. Here, we use model-based behavioral analysis to provide a detailed examination of the performance of human subjects in a moderately deep planning task. We find that subjects exploit the structure of the domain to establish subgoals in a way that achieves a nearly maximal reduction in the cost of computing values of choices, but then combine partial searches with greedy local steps to solve subtasks, and maladaptively prune the decision trees of subtasks in a reflexive manner upon encountering salient losses. Subjects come idiosyncratically to favor particular sequences of actions to achieve subgoals, creating novel complex actions or "options."

Keywords: hierarchical reinforcement learning; memoization; planning; pruning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Task. (A) Task display. On each trial, subjects saw six boxes. The bright box indicated the randomly chosen starting location. The number of moves to plan was displayed at the top. During the decision time of 9 s, subjects had to plan between three and five moves. Then, during the input time of 2.5 s, they had to enter their plan as a single sequence of right/left button presses in one go and without immediate feedback as to what state they were currently in or what rewards they had earned in the choice sequence so far. After the entire sequence had been entered, the chosen sequence and the rewards earned were displayed in the order in which they had been entered. Failure to enter a button press sequence of the right length in the given time resulted in a penalty of –200 pence. (B) Task structure. Subjects were placed in one of the six boxes (“states”) at the beginning of each trial and had to plan a path through the maze that maximized their total outcomes earned. From each state, two successor states could be reached deterministically by pressing either the right (dashed lines) or left (solid lines) key. For example, from state 1, state 4 could be reached by pressing left–left–right. Each transition resulted in a deterministic reward or loss. Red arrows, for instance, denote large salient losses of –70 points. The possible transitions were never displayed on screen. (C) Pruning. The decision tree faced by subjects for a depth-3 problem starting in state 3. When encountering one of the large losses (−70, red arrows in B) the search along that subtree is terminated. The blue parts of the tree would thereby not be evaluated and thus the cost of computation would be reduced. In this case, pruning leads to a suboptimal sequence appearing as being optimal. (D) Hierarchical fragmentation of the same problem. Rather than evaluating the entire depth-3 tree, a 2–1 fragmentation would first search the tree up to depth 2 (large green area), choose a depth-2 sequence (black arrow), and then search the remaining depth-1 tree (bottom right green area). The blue area of the tree is again not evaluated. Optimal choices in the fragmented tree may miss the overall optimal sequence, which in this case would be on the far left of the tree. If a subject emitted the sequence on the far right, this sequence would be more likely under the fragmentation 2–1 than under a nonfragmented tree of full depth 3. The effective “subgoal” corresponding to the target of the first fragment (the end state of the subsequence resulting from the first part of the fragmentation) is indicated by a red asterisk.

**Fig. 2.**
Fragmentation. (A) Fragment endpoint distributions when including all fragments within an individual choice sequence. Each panel shows the distribution of end states for fragments starting in each of the six states (fragment start states). The left column shows the end point distribution when considering all fragments. Fragments starting in state 5 terminated in state 2 or state 6 with high probability. The middle column shows the endpoints of fragments of length one, and the rightmost column the endpoints of fragments of length greater than one. (B) Endpoint distribution for fragmentation that achieves the optimal choice at the least computational cost.

**Fig. 3.**
Fragment characteristics. (A) Distribution over inferred fragment lengths. (B) Overall distribution over fragment endpoints. State 2 is the most frequent endpoint. Blue lines in A and B show the distributions for the optimal fragmentations. (C) Nested model comparison. Each bar shows the group-level iBIC score for one model, when adding additional cognitive processes. (D) Over time, only the most frequently used fragment increases in frequency, whereas all others decay and are used less frequently. (E) The entropy of the distribution over fragments used falls nearly linearly over time. (F) Discount factors (within fragments). An outcome lying x transitions ahead is multiplied by 1 − γ a total of x − 1 times. For outcomes lying distant to large losses (“specific pruning”) $1 - γ_{S}$ is substantially smaller than 1, implying robust discounting. In contrast, for outcomes distant to non-large loss outcomes (“general pruning”), $1 - γ_{G}$ is indistinguishable from 1 for every subject, meaning that these are not down-weighted within fragments. Thus, subjects search to the end of the fragment but show a strong tendency to stop the search at large losses even within the fragments $(1 - γ_{S} < 1)$ .

See this image and copyright information in PMC

Comment in

How to divide and conquer the world, one step at a time.
Daniel R, Schuck NW, Niv Y. Daniel R, et al. Proc Natl Acad Sci U S A. 2015 Mar 10;112(10):2929-30. doi: 10.1073/pnas.1500975112. Epub 2015 Mar 2. Proc Natl Acad Sci U S A. 2015. PMID: 25733879 Free PMC article. No abstract available.

Cited by

Modeling Search Behaviors during the Acquisition of Expertise in a Sequential Decision-Making Task.
Moënne-Loccoz C, Vergara RC, López V, Mery D, Cosmelli D. Moënne-Loccoz C, et al. Front Comput Neurosci. 2017 Sep 8;11:80. doi: 10.3389/fncom.2017.00080. eCollection 2017. Front Comput Neurosci. 2017. PMID: 28943847 Free PMC article.
Optimizing competence in the service of collaboration.
Xiang Y, Vélez N, Gershman SJ. Xiang Y, et al. Cogn Psychol. 2024 May;150:101653. doi: 10.1016/j.cogpsych.2024.101653. Epub 2024 Mar 18. Cogn Psychol. 2024. PMID: 38503178 Free PMC article.
Rational use of cognitive resources in human planning.
Callaway F, van Opheusden B, Gul S, Das P, Krueger PM, Griffiths TL, Lieder F. Callaway F, et al. Nat Hum Behav. 2022 Aug;6(8):1112-1125. doi: 10.1038/s41562-022-01332-8. Epub 2022 Apr 28. Nat Hum Behav. 2022. PMID: 35484209
What to Choose Next? A Paradigm for Testing Human Sequential Decision Making.
Tartaglia EM, Clarke AM, Herzog MH. Tartaglia EM, et al. Front Psychol. 2017 Mar 7;8:312. doi: 10.3389/fpsyg.2017.00312. eCollection 2017. Front Psychol. 2017. PMID: 28326050 Free PMC article.
Songbirds work around computational complexity by learning song vocabulary independently of sequence.
Lipkind D, Zai AT, Hanuschkin A, Marcus GF, Tchernichovski O, Hahnloser RHR. Lipkind D, et al. Nat Commun. 2017 Nov 1;8(1):1247. doi: 10.1038/s41467-017-01436-0. Nat Commun. 2017. PMID: 29089517 Free PMC article.

See all "Cited by" articles

References

1. Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell. 1999;112:181–211.
1. Botvinick MM, Niv Y, Barto AC. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition. 2009;113(3):262–280. - PMC - PubMed
1. Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8(12):1704–1711. - PubMed
1. Huys QJM, et al. Bonsai trees in your head: How the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLOS Comput Biol. 2012;8(3):e1002410. - PMC - PubMed
1. Dezfouli A, Balleine BW. Habits, action sequences and reinforcement learning. Eur J Neurosci. 2012;35(7):1036–1051. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell. 1999;112:181–211.

[2] Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell. 1999;112:181–211.

[3] Botvinick MM, Niv Y, Barto AC. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition. 2009;113(3):262–280. - PMC - PubMed

[4] Botvinick MM, Niv Y, Barto AC. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition. 2009;113(3):262–280. - PMC - PubMed

[5] Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8(12):1704–1711. - PubMed

[6] Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8(12):1704–1711. - PubMed

[7] Huys QJM, et al. Bonsai trees in your head: How the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLOS Comput Biol. 2012;8(3):e1002410. - PMC - PubMed

[8] Huys QJM, et al. Bonsai trees in your head: How the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLOS Comput Biol. 2012;8(3):e1002410. - PMC - PubMed

[9] Dezfouli A, Balleine BW. Habits, action sequences and reinforcement learning. Eur J Neurosci. 2012;35(7):1036–1051. - PMC - PubMed

[10] Dezfouli A, Balleine BW. Habits, action sequences and reinforcement learning. Eur J Neurosci. 2012;35(7):1036–1051. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Interplay of approximate planning strategies

Affiliations

Interplay of approximate planning strategies

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources