Neural correlates of strategic reasoning during competitive games

Hyojung Seo¹, Xinying Cai², Christopher H Donahue², Daeyeol Lee³

Affiliations

¹ Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA. hyojung.seo@yale.edu daeyeol.lee@yale.edu.
² Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA.
³ Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA. Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, CT 06510, USA. Department of Psychology, Yale University, New Haven, CT 06510, USA. hyojung.seo@yale.edu daeyeol.lee@yale.edu.

PMID: 25236468
PMCID: PMC4201877
DOI: 10.1126/science.1256254

Neural correlates of strategic reasoning during competitive games

Hyojung Seo et al. Science. 2014.

. 2014 Oct 17;346(6207):340-3.

doi: 10.1126/science.1256254. Epub 2014 Sep 18.

Authors

Hyojung Seo¹, Xinying Cai², Christopher H Donahue², Daeyeol Lee³

Affiliations

¹ Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA. hyojung.seo@yale.edu daeyeol.lee@yale.edu.
² Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA.
³ Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA. Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, CT 06510, USA. Department of Psychology, Yale University, New Haven, CT 06510, USA. hyojung.seo@yale.edu daeyeol.lee@yale.edu.

PMID: 25236468
PMCID: PMC4201877
DOI: 10.1126/science.1256254

Abstract

Although human and animal behaviors are largely shaped by reinforcement and punishment, choices in social settings are also influenced by information about the knowledge and experience of other decision-makers. During competitive games, monkeys increased their payoffs by systematically deviating from a simple heuristic learning algorithm and thereby countering the predictable exploitation by their computer opponent. Neurons in the dorsomedial prefrontal cortex (dmPFC) signaled the animal's recent choice and reward history that reflected the computer's exploitative strategy. The strength of switching signals in the dmPFC also correlated with the animal's tendency to deviate from the heuristic learning algorithm. Therefore, the dmPFC might provide control signals for overriding simple heuristic learning algorithms based on the inferred strategies of the opponent.

PubMed Disclaimer

Figures

**Fig. 1. Behavioral task and performance**
Biased matching pennies (A) and its payoff matrix (B). R, risky target; S, safe target. (C) Behavioral effects of gains and losses. Average regression coefficients (ordinate) quantified the tendency for the animal to choose the same target that produced a particular outcome in each of the last 10 trials. Arrows indicate the attenuation in the immediate effect of loss. Error bars, SEM.

**Fig. 2. Systematic deviations from reinforcement learning was beneficial**
The color of each box in the decision trees (top) and the position of each circle in the scatter plots (bottom) indicate how much the probability of choosing the safe target deviated from the prediction of the best fitting reinforcement learning model (abscissa in the bottom scatter plot) according to the choices and outcomes in the last two trials, and how this increased or decreased the probability of token compared to the Nash-equilibrium strategy (ordinate in the scatter plot). Numbers indicate different sequences of choices and outcomes in the two preceding trials. Solid boxes correspond to the sequences included in the best hybrid reinforcement learning model (14). R−, and R+ denote loss and gain from the risky target, respectively, whereas S0 and S+ neutral outcome and gain from the safe target.

**Fig. 3. Cortical activity related to the conjunctions of choices and outcomes**
(A) Fraction of neurons in each brain region that significantly modulated their activity during the delay period according to high-order conjunctions of choices and outcomes (14). (B) The time course of signals plotted in (A), using the same color code used to indicate different brain areas. (C) Spike density functions of an example dmPFC neuron sorted by the animal’s choices (R, risky; S, safe) and outcomes (+, 0, and − for gain, neutral and loss) as well as the positions of the chosen target in the current (t) and last (t−1) trials. Colored disks indicate different sequences of previous choices and outcomes, and asterisks indicate the activity re-plotted in Fig. 4.

**Fig. 4. Cortical signals for deviation from simple reinforcement learning**
(A) Spike density functions from a dmPFC neuron (shown in Fig. 3C) sorted by the animal’s choices in the current and previous trials for 3 different sequences of outcomes in the last two trials (indicated by the text label and color defined in Fig. 3C). Δ denotes the difference in the accuracy of decoding the animal’s choice in switch vs. stay trials. (B) The difference in the decoding accuracy, ΔDA(switch), plotted as a function of how much more often the animal switched its choices compared to the prediction from the simple RL algorithm. (C) The same results shown in (B) for the entire population of dmPFC neurons (left) and averaged for each outcome sequence (identified by colors defined in Fig. 3C; right). Lines correspond to the best-fitting regression models. (D) The correlation coefficient between ΔDA and the deviation from reinforcement learning model for two different data sets (BMP, biased matching pennies; MP, matching pennies).

See this image and copyright information in PMC

References

1. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer; New York: 2001.
1. Gigerenzer G, Brighton H. Homo heuristicus: why biased minds make better inferences. Top Cogn Sci. 2009;1:107–143. - PubMed
1. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998.
1. Ito M, Doya K. Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Curr Opin Neurobiol. 2011;21:368–373. - PubMed
1. Lee D, Seo H, Jung MW. Neural basis of reinforcement learning and decision making. Annu Rev Neurosci. 2012;35:287–308. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Neural correlates of strategic reasoning during competitive games

Affiliations

Neural correlates of strategic reasoning during competitive games

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources