. 2020 Apr 8;106(1):166-176.e6.

doi: 10.1016/j.neuron.2020.01.017. Epub 2020 Feb 11.

Inference-Based Decisions in a Hidden State Foraging Task: Differential Contributions of Prefrontal Cortical Areas

Pietro Vertechi¹, Eran Lottem², Dario Sarra¹, Beatriz Godinho³, Isaac Treves⁴, Tiago Quendera¹, Matthijs Nicolai Oude Lohuis⁵, Zachary F Mainen⁶

Affiliations

¹ Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal.
² The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, 91904 Jerusalem, Israel.
³ Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal; Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.
⁴ MIT Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 46-2005, Cambridge, MA 02139-4307, USA.
⁵ Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098XH Amsterdam, the Netherlands; Research Priority Area Brain and Cognition, University of Amsterdam, Amsterdam, the Netherlands.
⁶ Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal. Electronic address: zmainen@neuro.fchampalimaud.org.

PMID: 32048995
PMCID: PMC7146546
DOI: 10.1016/j.neuron.2020.01.017

Inference-Based Decisions in a Hidden State Foraging Task: Differential Contributions of Prefrontal Cortical Areas

Pietro Vertechi et al. Neuron. 2020.

. 2020 Apr 8;106(1):166-176.e6.

doi: 10.1016/j.neuron.2020.01.017. Epub 2020 Feb 11.

Authors

Pietro Vertechi¹, Eran Lottem², Dario Sarra¹, Beatriz Godinho³, Isaac Treves⁴, Tiago Quendera¹, Matthijs Nicolai Oude Lohuis⁵, Zachary F Mainen⁶

Affiliations

¹ Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal.
² The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, 91904 Jerusalem, Israel.
³ Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal; Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.
⁴ MIT Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 46-2005, Cambridge, MA 02139-4307, USA.
⁵ Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098XH Amsterdam, the Netherlands; Research Priority Area Brain and Cognition, University of Amsterdam, Amsterdam, the Netherlands.
⁶ Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal. Electronic address: zmainen@neuro.fchampalimaud.org.

PMID: 32048995
PMCID: PMC7146546
DOI: 10.1016/j.neuron.2020.01.017

Abstract

Essential features of the world are often hidden and must be inferred by constructing internal models based on indirect evidence. Here, to study the mechanisms of inference, we establish a foraging task that is naturalistic and easily learned yet can distinguish inference from simpler strategies such as the direct integration of sensory data. We show that both mice and humans learn a strategy consistent with optimal inference of a hidden state. However, humans acquire this strategy more than an order of magnitude faster than mice. Using optogenetics in mice, we show that orbitofrontal and anterior cingulate cortex inactivation impacts task performance, but only orbitofrontal inactivation reverts mice from an inference-based to a stimulus-bound decision strategy. These results establish a cross-species paradigm for studying the problem of inference-based decision making and begins to dissect the network of brain regions crucial for its performance.

Keywords: PFC; cross-species task; foraging; inference; state representation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1**
A Probabilistic Foraging Task Can Dissociate Stimulus-Bound from Inference-Based Evidence Accumulation (A) Formally, the task is a hidden Markov model with *LeftActive* and *RightActive* states. It has two parameters: probability of reward given state and probability of state transition. (B and C) Estimated relative value (left minus right) as a function of trial history (rewards in green, failures in gray) in the stimulus-bound model (B) and inference-based model (C), respectively. Shaded patches indicate actual state. (D) Effect of rewards on relative value in stimulus-bound and inference-based models: the two models are simulated in a trial with only rewards on the same site. Relative value increases with reward number in the stimulus-bound but not in the inference-based model. (E) Consecutive failures before leaving (normalized subtractively) as a function of reward number in a simulated data of stimulus-bound and inference-based models: reward number has an effect on consecutive failures in the stimulus-bound but not in the inference-based model.

**Figure 2**
Mice Accumulate Inferred Evidence for State Switches and Not Site Value (A) Schematic of rodent task. Mice shuttle back and forth between two reward sites to obtain water rewards. (B) Example sequence of pokes. Pokes in the correct site can be rewarded or not, whereas pokes in the incorrect site are never rewarded. Following a state switch, the animals need to travel to the other site to obtain more rewards. (C) Example behavior: sequence of poke bouts (i.e., trials) with rewards in green and failures in gray. (D) Consecutive failures before leaving as a function of reward number in early training (days 1 to 3, purple) compared with late training (days 10 to 12, black). Solid line represents mean and sha. (E) Slope coefficient in $ConsecutiveFailures \sim 1 + RewardNumber$ for early training and late training. Slope coefficient is higher in early trials, likelihood ratio test on linear mixed-effect model $ConsecutiveFailures \sim 1 + RewardNumber + E a r l y + RewardNumber & E a r l y + (1 | M o u s e I D)$ versus a null model with no interaction: p < 1e−10, n = 18 mice (see STAR Methods for a description of the formula notation). (F) Evolution of reward number coefficient across days. Solid line and shaded area represent mean and across animals. Solid line and shaded area represent mean and SEM across animals. (G) Probability of leaving as a function of number of rewards and consecutive failures in late training. (H) Failures after reward as a function of failure before reward in trials with only one reward in a more difficult protocol. Solid line and shaded area represent mean and across animals. Solid line and shaded area represent mean and SEM across animals. See also Video S1.

**Figure 3**
Accumulation of Inferred Evidence Is Tuned to Task Parameters (A) Probability of being on the correct site after a failure as a function of reward probability and transition probability. (B) Probability of being on the correct site as a function of trial history for three protocols (Easy environment: p_RWD = 0.9 and p_SW = 0.9; Medium environment: p_RWD = 0.9 and p_SW = 0.3; Hard environment: p_RWD = 0.3 and p_SW = 0.3). Leaving decisions can be modeled by setting a threshold on this probability that changes as a function of the travel cost (black lines). (C) Consecutive failures before leaving as a function of the environment statistics and barrier condition. Error bars represent SEM across animals. (D) Consecutive failures before leaving split by subject and environment statistics, barrier versus no barrier.

**Figure 4**
Humans Perform Optimal Inference and Tune Behavior to Task Parameters (A) The number of rewards has little effect on the probability of leaving during both early (purple) and late (black) training. Solid line and shaded area represent mean and across animals. Solid line and shaded area represent mean and SEM across animals. (B) Number of consecutive failures as a function of reward number for human in early versus late part of training. Unlike mice, humans learn the statistics of the environment extremely quickly: slope coefficient is similar (and around 0) in both early and late trials: likelihood ratio test on linear mixed-effect model $ConsecutiveFailures ~ 1 + RewardNumber + E a r l y + RewardNumber & E a r l y + (1 | S u b j e c t I D)$ versus a null model with no interaction: p = 0.45, n = 20 subjects. (C) Consecutive failures before leaving as a function of the environment statistics and barrier condition. Error bars represent SEM across subjects. (D) Consecutive failures before leaving split by subject and environment statistics, barrier versus no barrier. See also Video S2.

**Figure 5**
OFC, but Not ACC, Is Necessary for Optimal Inference (A) Scheme of the optic fiber placement. (B) Bilateral photostimulation at 3 mW happened during nose-poking: it was triggered by the first poke in 50% of trials and lasted for 500 ms after the last poke in the trial. (C–E) Consecutive failures before leaving split by environment statistics, barrier condition, and subject inactivation versus control trials, for ACC implanted heterozygotes (C), OFC implanted heterozygotes (D), and wild-types (E), respectively. (F–H) Ratio of consecutive failures before leaving split in the same way as (C)–(E) for ACC implanted heterozygotes (F), OFC implanted heterozygotes (G), and wild-types (H), respectively. When predicting renormalized consecutive failures, *Stimulation* and *Protocol* interact for OFC implanted heterozygotes (p < 1e−10, n = 6 mice) but not for ACC implanted heterozygotes (p = 0.15, n = 6 mice) or wild-types (p = 0.77, n = 7 mice). (I–K) An animal by animal quantification: the coefficient of the interaction term in $ConsecutiveFailures \sim 1 + Stimulation + Reward number + Reward number & S t i m u l a t i o n$ for ACC implanted heterozygotes (I), OFC implanted heterozygotes (J), and wild-types (K). (L–N) Number of consecutive failures as a function of reward number in the 30-30 barrier protocol for ACC implanted heterozygotes (L), OFC implanted heterozygotes (M), and wild-types (N). Solid lines and shaded areas represent mean and SEM across animals. See also Figure S2 and Table S1.

See this image and copyright information in PMC

References

1. Bezanson J., Edelman A., Karpinski S., Shah V. Julia: A Fresh Approach to Numerical Computing. SIAM Rev. 2017;59:65–98.
1. Boyen X., Friedman N., Koller D. Discovering the Hidden Structure of Complex Dynamic Systems. arXiv. 1999 https://arxiv.org/abs/1301.6683 arXiv:1301.6683.
1. Braun D.A., Mehring C., Wolpert D.M. Structure learning in action. Behav. Brain Res. 2010;206:157–165. - PMC - PubMed
1. Brunton B.W., Botvinick M.M., Brody C.D. Rats and humans can optimally accumulate evidence for decision-making. Science. 2013;340:95–98. - PubMed
1. Charnov E.L. Optimal foraging, the marginal value theorem. Theor. Popul. Biol. 1976;9:129–136. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

671251/ERC_/European Research Council/International

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Mouse Genome Informatics (MGI)
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inference-Based Decisions in a Hidden State Foraging Task: Differential Contributions of Prefrontal Cortical Areas

Affiliations

Inference-Based Decisions in a Hidden State Foraging Task: Differential Contributions of Prefrontal Cortical Areas

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous