Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 6;2(9):e0000330.
doi: 10.1371/journal.pdig.0000330. eCollection 2023 Sep.

Reliability of gamified reinforcement learning in densely sampled longitudinal assessments

Affiliations

Reliability of gamified reinforcement learning in densely sampled longitudinal assessments

Monja P Neuser et al. PLOS Digit Health. .

Abstract

Reinforcement learning is a core facet of motivation and alterations have been associated with various mental disorders. To build better models of individual learning, repeated measurement of value-based decision-making is crucial. However, the focus on lab-based assessment of reward learning has limited the number of measurements and the test-retest reliability of many decision-related parameters is therefore unknown. In this paper, we present an open-source cross-platform application Influenca that provides a novel reward learning task complemented by ecological momentary assessment (EMA) of current mental and physiological states for repeated assessment over weeks. In this task, players have to identify the most effective medication by integrating reward values with changing probabilities to win (according to random Gaussian walks). Participants can complete up to 31 runs with 150 trials each. To encourage replay, in-game screens provide feedback on the progress. Using an initial validation sample of 384 players (9729 runs), we found that reinforcement learning parameters such as the learning rate and reward sensitivity show poor to fair intra-class correlations (ICC: 0.22-0.53), indicating substantial within- and between-subject variance. Notably, items assessing the psychological state showed comparable ICCs as reinforcement learning parameters. To conclude, our innovative and openly customizable app framework provides a gamified task that optimizes repeated assessments of reward learning to better quantify intra- and inter-individual differences in value-based decision-making over time.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Illustration of the reinforcement learning task design.
A. Representative in-game screen of Influenca. To earn points, participants must identify which medication is most effective in fighting pathogens. In each trial, only one drug is effective to cure people. In this trial, the orange drug could treat 71 people and the turquoise drug would only treat 29 people. The circle depicts the color of the drug that was effective in the previous trial. If participants pick the correct medication, their score increases by the number of cured people (win). If they pick the incorrect medication, the number of falsely treated people will be subtracted from the score (loss). B. Procedure of the Influenca runs. Each run starts with ecological momentary assessment questions about participants’ current mood and other states prior to the actual game, followed by 150 trials of the reinforcement learning paradigm. To ensure sampling across different states, there was a minimum of 2 hours waiting time enforced between the runs.
Fig 2
Fig 2. Illustration of estimated parameters and their relation to differences in value-based decision-making and learning (representative participants).
a. The reward sensitivity beta scales how action weights (i.e., a combination of estimated probability and potential reward value) are translated into choices. Higher reward sensitivities translate to more deterministic choices (i.e., exploitation), whereas lower reward sensitivities lead to more random choices (i.e., exploration). b. The learning rate alpha captures how quickly estimated win probabilities are updated if new information is available. High learning rates (upper panel) lead to fast updates and quick forgetting of long-term outcomes. The black line depicts the latent win probability, while the points depict the estimated win probability based on the reinforcement learning model. c. Weighting of the estimated win probability of each option compared to the offered rewards is scaled by λ. Low values (< .5, upper panel) reduce the importance of the learned win probabilities leading to choices based primarily on the potential reward at stake. In contrast, high values (>.5, lower panel) increase the importance of the learned win probabilities. Color in panel b and c indicates the observed choices in these exemplary runs.
Fig 3
Fig 3. Reward outcome per run for different combinations of learning parameters based on simulated and behavioral data.
a. Simulation of N = 50,000 players shows high rewards for different combinations of learning rate for losses and reward sensitivity. We show an exemplary grid for λ > .9, the optimal lambda (S3 Fig), and αwin between .6 and .8, the average in our sample. Other combinations are shown in S3 Fig. b. Empirical data from the participants show high average rewards for moderate learning rates for losses between 0.2 and 0.275 (p < .05). c. Average reward is only weakly dependent on the learning rate for wins (r = -.04). d. Average reward increases with reward sensitivity (r = .32). e. Average reward is highest for lambda = 1 reflecting choices based on learned reward probabilities without considering reward values.
Fig 4
Fig 4. Reinforcement learning parameters and model-independent performance indices improve over runs.
a. Learning rate for wins, αwin, does not change over runs (b = -0.006, p = .19). b. Learning rate for losses, αloss, decreases over runs (b = -0.06, p < .001). c. Reward sensitivity β increases over runs (b = 0.72, p < .001). d. Weighting of win probabilities compared to reward magnitudes, λ, increases over runs (b = 0.03, p < .001). e. Response times decrease over runs (b = -0.23, p < .001) f. The average reward increases over runs (b = 1.09, p < .001). The dots show mean values with 95% bootstrapped confidence intervals.
Fig 5
Fig 5. Reliability of parameter estimates from the learning model improves after the first runs.
Dots depict the rank correlation of parameter estimates in one run with the mean across all other runs (leave-run out), separated per parameter. Red dashed lines show the classification of correlation magnitudes according to Taylor (1990). Results are independent of the exclusion criteria for behavior following other behavioral styles than reward learning as determined by extremely high log-likelihoods (S5 Fig).
Fig 6
Fig 6. State items show poor to fair intra-class correlation coefficients (0.29–0.53).
A-F. Mean and variance of selected EMA items per individual, ranked by mean value of each participant across runs. Dots represent the values per run.

References

    1. Sutton RS, Barto AG. Reinforcement learning: An introduction: MIT press; 2018.
    1. Chen C, Takahashi T, Nakagawa S, Inoue T, Kusumi I. Reinforcement learning in depression: a review of computational research. Neuroscience & Biobehavioral Reviews. 2015;55:247–67. - PubMed
    1. Eshel N, Roiser JP. Reward and punishment processing in depression. Biological psychiatry. 2010;68(2):118–24. doi: 10.1016/j.biopsych.2010.01.027 - DOI - PubMed
    1. Mkrtchian A, Aylward J, Dayan P, Roiser JP, Robinson OJ. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning. Biological Psychiatry. 2017;82(7):532–9. doi: 10.1016/j.biopsych.2017.01.017 - DOI - PMC - PubMed
    1. Schaefer LM, Steinglass JE. Reward Learning Through the Lens of RDoC: a Review of Theory, Assessment, and Empirical Findings in the Eating Disorders. Current Psychiatry Reports. 2021;23(1):1–11. doi: 10.1007/s11920-020-01213-9 - DOI - PubMed

LinkOut - more resources