Data-driven equation discovery reveals nonlinear reinforcement learning in humans

Kyle J LaFollette^{1

2}, Janni Yuval³, Roey Schurr⁴, David Melnikoff⁵, Amit Goldenberg^{4

6

7}

Affiliations

¹ Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH 44106.
² Booth School of Business, University of Chicago, Chicago, IL 60637.
³ Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁴ Department of Psychology, Harvard University, Cambridge, MA 02138.
⁵ Graduate School of Business, Stanford University, Stanford CA 94305.
⁶ Harvard Business School, Harvard University, Boston, MA 02163.
⁷ Digital, Data and Design Institute, Harvard University, Cambridge, MA 02138.

PMID: 40743390
PMCID: PMC12337339
DOI: 10.1073/pnas.2413441122

Data-driven equation discovery reveals nonlinear reinforcement learning in humans

Kyle J LaFollette et al. Proc Natl Acad Sci U S A. 2025.

. 2025 Aug 5;122(31):e2413441122.

doi: 10.1073/pnas.2413441122. Epub 2025 Jul 31.

Authors

Kyle J LaFollette^{1

2}, Janni Yuval³, Roey Schurr⁴, David Melnikoff⁵, Amit Goldenberg^{4

6

7}

Affiliations

¹ Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH 44106.
² Booth School of Business, University of Chicago, Chicago, IL 60637.
³ Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁴ Department of Psychology, Harvard University, Cambridge, MA 02138.
⁵ Graduate School of Business, Stanford University, Stanford CA 94305.
⁶ Harvard Business School, Harvard University, Boston, MA 02163.
⁷ Digital, Data and Design Institute, Harvard University, Cambridge, MA 02138.

PMID: 40743390
PMCID: PMC12337339
DOI: 10.1073/pnas.2413441122

Abstract

Computational models of reinforcement learning (RL) have significantly contributed to our understanding of human behavior and decision-making. Traditional RL models, however, often adopt a linear approach to updating reward expectations, potentially oversimplifying the nuanced relationship between human behavior and rewards. To address these challenges and explore models of RL, we utilized a method of model discovery using equation discovery algorithms. This method, currently used mainly in physics and biology, attempts to capture data by proposing a differential equation from an array of suggested linear and nonlinear functions. Using this method, we were able to identify a model of RL which we termed the Quadratic Q-Weighted model. The model suggests that reward prediction errors obey nonlinear dynamics and exhibit negativity biases, resulting in an underweighting of reward when expectations are low, and an overweighting of the absence of reward when expectations are high. We tested the generalizability of our model by comparing it to classical models used in nine published studies. Our model surpassed traditional models in predictive accuracy across eight out of these nine published datasets, demonstrating not only its generalizability but also its potential to offer insights into the complexities of human learning. This work showcases the integration of a behavioral task with advanced computational methodologies as a potent strategy for uncovering the intricate patterns of human cognition, marking a significant step forward in the development of computational models that are both interpretable and broadly applicable.

Keywords: dynamical systems; machine learning; nonlinear modeling; reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement:The authors declare no competing interest.

Figures

**Fig. 1.**
Structure of learning task used in Studies 1 and 2. Participants inspected phones produced from an assembly line. On each trial, a single phone was revealed to either be working or defective. Following each observation, participants were asked to rate on a scale from 0 to 100% what they thought was the likelihood of the next phone being a working phone.

**Fig. 2.**
An overview of behavior of the Quadratic Q-Weighted model we discovered using empirical data with SINDy. The x-axes reflect reported Q value and the y-axes are the median change in value. Gray dots show binned Q into 10 discrete categories, each with a bin size of 0.1. Categories were labeled with the upper bound of each bin. Error bars are 95% CI. (A) Study 1 empirical change in Q following no reward. (B) Study 1 empirical change in Q following reward. (C) Study 2 empirical change in Q following no reward. (D) Study 2 empirical change in Q following reward. Predicted changes in Q according to the best fit Quadratic Q-Weighted model (solid red) and the best fit Rescorla–Wagner model (dashed blue) are overlaid.

**Fig. 3.**
Empirical changes in expectation Q as a function of Q’s position relative to the stable point ( $\sqrt{a / b})$ and reward. Error bars are 95% CI. 10% of observations are included as dots to visualize the response distribution. Decreases in Q can be observed when Q is greater than the stable point, even following reward.

**Fig. 4.**
Expected value estimates (Q_t) over 100 trials for a single representative participant. The “Empirical” panel represents the participant’s reported values (black lines), while the remaining panels depict predictions (red lines) from models: RW (Rescorla–Wagner), RW with exponential decay, RW with asymmetric learning rates, QQW (Quadratic Q-Weighted), and the Kalman Filter. Missing data correspond to trials where attention checks were administered. Although all models demonstrate a generally good fit to the observed data, the QQW model stands out with a superior fit.

See this image and copyright information in PMC

References

1. Guest O., Martin A. E., How computational modeling can force theory building in psychological science. Perspect. Psychol. Sci. 16, 789–802 (2021). - PubMed
1. Sutton R. S., Barto A. G., Reinforcement Learning: An Introduction (MIT Press, 1998).
1. Akam T., Walton M. E., What is dopamine doing in model-based reinforcement learning? Curr. Opin. Behav. Sci. 38, 74–82 (2021). - PMC - PubMed
1. Doll B. B., Simon D. A., Daw N. D., The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiol. 22, 1075–1081 (2012). - PMC - PubMed
1. Gershman S. J., Uchida N., Believing in dopamine. Nat. Rev. Neurosci. 20, 703–714 (2019). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data-driven equation discovery reveals nonlinear reinforcement learning in humans

Affiliations

Data-driven equation discovery reveals nonlinear reinforcement learning in humans

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources