. 2019 Apr 1:317:37-44.

doi: 10.1016/j.jneumeth.2019.01.006. Epub 2019 Jan 18.

Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models

Ian C Ballard¹, Samuel M McClure²

Affiliations

¹ Neurosciences Graduate Training Program, Stanford University, Stanford, CA 94305, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA; Department of Psychology, Arizona State University, Tempe, AZ 85287, USA. Electronic address: iancballard@gmail.com.
² Department of Psychology, Arizona State University, Tempe, AZ 85287, USA.

PMID: 30664916
PMCID: PMC8930195
DOI: 10.1016/j.jneumeth.2019.01.006

Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models

Ian C Ballard et al. J Neurosci Methods. 2019.

. 2019 Apr 1:317:37-44.

doi: 10.1016/j.jneumeth.2019.01.006. Epub 2019 Jan 18.

Authors

Ian C Ballard¹, Samuel M McClure²

Affiliations

¹ Neurosciences Graduate Training Program, Stanford University, Stanford, CA 94305, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA; Department of Psychology, Arizona State University, Tempe, AZ 85287, USA. Electronic address: iancballard@gmail.com.
² Department of Psychology, Arizona State University, Tempe, AZ 85287, USA.

PMID: 30664916
PMCID: PMC8930195
DOI: 10.1016/j.jneumeth.2019.01.006

Abstract

Background: Reinforcement learning models provide excellent descriptions of learning in multiple species across a variety of tasks. Many researchers are interested in relating parameters of reinforcement learning models to neural measures, psychological variables or experimental manipulations. We demonstrate that parameter identification is difficult because a range of parameter values provide approximately equal quality fits to data. This identification problem has a large impact on power: we show that a researcher who wants to detect a medium sized correlation (r = .3) with 80% power between a variable and learning rate must collect 60% more subjects than specified by a typical power analysis in order to account for the noise introduced by model fitting.

New method: We derive a Bayesian optimal model fitting technique that takes advantage of information contained in choices and reaction times to constrain parameter estimates.

Results: We show using simulation and empirical data that this method substantially improves the ability to recover learning rates.

Comparison with existing methods: We compare this method against the use of Bayesian priors. We show in simulations that the combined use of Bayesian priors and reaction times confers the highest parameter identifiability. However, in real data where the priors may have been misspecified, the use of Bayesian priors interferes with the ability of reaction time data to improve parameter identifiability.

Conclusions: We present a simple technique that takes advantage of readily available data to substantially improve the quality of inferences that can be drawn from parameters of reinforcement learning models.

Keywords: Delay discounting; Intertemporal choice; Parameter estimation; Power; Q-learning; Reproducibility; Striatum.

PubMed Disclaimer

Conflict of interest statement

Declarations of interest

None.

Figures

**Fig. 1.**
Likelihood surface of reinforcement and delay discounting models. A) Likelihood surface for a reinforcement learning model of a simulated subject on a 2-arm bandit task (α = .3, m = 2). There is a tradeoff between learning rate and inverse temperature, such that a lower learning rate and more reliable responding provides a similar fit as a higher learning rate and more random responding. B) Likelihood surface for a hyperbolic model of a simulated subject in an delay discounting task (k = .011, m = 2). Most of the uncertainty comes from the inverse temperature parameter. Compared to A, there is only a modest tradeoff between discount rate and choice noise. C) Estimates of the parameter correlation between α and m at the maximum likelihood estimate for data simulated from a range of parameter settings. Parameter anticorrelation is high for nearly the full range of parameter settings. D) Estimates of the parameter correlation between k and m at the maximum likelihood estimate for data simulated from a range of parameter settings. Parameter correlation is high for very noisy subjects and values of k near the edges of the choice set, but is generally low for a wide range of typical parameter settings.

**Fig. 2.**
Simulation of parameter identifiability in RL. A) The correlation between ground truth and fitted learning rates as a function of the number of bandit trials (25 subjects). Increasing the number of trials improves parameter identifiability, and the use of reaction times and Bayesian priors substantially improves parameter identifiability regardless of the number of trials. B) The correlation between ground truth and fitted learning rates as a function of the number of subjects (200 trials). Increasing the number of subjects does not improve parameter identifiability. The use of reaction times and Bayesian priors substantially improves parameter identifiability regardless of the number of subjects.

**Fig. 3.**
Effect of parameter identifiability on experimental power. Green lines depict the power to detect a correlation between two variables for different sample sizes. Purple lines depict the power after accounting for the noise introduced by model-fitting.

**Fig. 4.**
Joint modeling of choice and reaction times improves parameter identifiability in real data. A) The correlation in estimated learning rate between two runs of a bandit task using a standard RL model. B) The correlation in estimated learning rate when estimated using a model of reaction times and choice.

See this image and copyright information in PMC

References

1. Ballard IC, Kim B, Liatsis A, Aydogan G, Cohen JD, McClure SM, 2017. More is meaningful: the magnitude effect in intertemporal choice depends on self-control. Psychol. Sci 27 956797617711455–956797617711454. - PMC - PubMed
1. Bartra O, McGuire JT, Kable JW, 2013. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427. - PMC - PubMed
1. Bayer HM, Glimcher PW, 2005. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141. - PMC - PubMed
1. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS, 2007. Learning the Value of Information in an Uncertain World 10. Nature Publishing Group, pp. 1214. - PubMed
1. Bornstein AM, Daw ND, 2012. Dissociating hippocampal and striatal contributions to sequential prediction learning. Eur. J. Neurosci. 35, 1011–1023. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 MH091068/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models

Affiliations

Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources