Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Sep 14:2023.09.05.556267.
doi: 10.1101/2023.09.05.556267.

Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics

Affiliations

Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics

Michael Bukwich et al. bioRxiv. .

Update in

Abstract

Patch foraging presents a ubiquitous decision-making process in which animals decide when to abandon a resource patch of diminishing value to pursue an alternative. We developed a virtual foraging task in which mouse behavior varied systematically with patch value. Mouse behavior could be explained by a model integrating time and rewards antagonistically, scaled by a latent patience state. The model accounted for deviations from predictions of optimal foraging theory. Neural recordings throughout frontal areas revealed encoding of decision variables from the integrator model, most robustly in frontal cortex. Regression modeling followed by unsupervised clustering identified a subset of ramping neurons. These neurons' firing rates ramped up gradually (up to tens of seconds), were inhibited by rewards, and were better described as a continuous ramp than a discrete stepping process. Together, these results identify integration via frontal cortex ramping dynamics as a candidate mechanism for solving patch foraging problems.

PubMed Disclaimer

Conflict of interest statement

Competing Interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Mice foraging times calibrate to reward statistics.
A) Virtual linear track for patch foraging task. B) Combinations of three reward sizes and frequencies yield nine patch types. C) Probability of reward delivery after each one-second interval per frequency condition. D) Example trials from two mice. Mice stop running in response to the proximity cue to enter patches and receive stochastically delivered water rewards. Photographs of the three task states are shown. Monitor brightness was increased for the photos. E) Reward deliveries and patch leave times from an example session grouped by patch reward size. Trials are sorted in ascending order of PRT from top to bottom per reward size. Dots are shaded based on underlying reward frequency condition. F) PRT per patch type from an example mouse. Column groupings separate patches by reward size. Within columns, patches are split by reward frequency. G) A generalized linear mixed-effects model (GLME) with log link function predicts mice’s PRTs across conditions. To demonstrate the log-linearity of the fit, frequency conditions are spaced proportionally to their values (0.125:0.25:0.5). Dots indicate empirical mouse PRT with error bars showing standard error of the mean. Lines represent GLME fits. Colored by mouse ID (in GLME equation, ‘i’ = mouse ID).
Figure 2:
Figure 2:. PRT variance is explained by scaling a common function.
A) Left: Mean PRT per μL across sessions, example mouse. Right: Standard deviation of mean PRT across sessions per subject, colored by mouse ID. B) Autocorrelation function of PRT over patches, mean across subjects. Error bars indicate standard deviation across subjects. C) Applying a Gaussian filter of PRT over successive trials provides an estimate of latent state. Left: Example session latent state inference (mouse 80). Gray trace tracks PRT across patches with colored dots indicating reward size for each patch. Black trace indicates value of the latent state estimation. Middle: Estimated latent across trials over all 10 sessions from mouse 80. Right: Mean normalized latent estimates across sessions per mouse. D) A generalized linear model (GLM) scaled by latent patience estimates accounts for PRTs across mice. Dots indicate empirical mouse PRT. Lines represent GLM fits. E) R2 statistics for predicting PRT using the GLME (from Fig. 1G), mean latent per session, per trial latent, and the latent-scaled GLM. Models ordered by ascending levels of R2, stars indicate significance of pairwise Wilcoxon signed-rank tests (p = .0044, .0027, .002, Bonferroni-adjusted).
Figure 3:
Figure 3:. A competitive integration process explains foraging behavior.
A) Instantaneous expected reward rate at time of patch leave across patch types, predicted by optimal MVT (Left), sample mouse (Middle, Bonferroni-adjusted p < 0.0001 for size, p > .99 for frequency, 2-way ANOVA), and population (Right, p < 0.0001 for size and frequency, linear mixed-effects model). Error bars for sample mouse indicate standard error of the mean. Error bars for population average indicate standard deviation across mice. B) Schematic demonstrating MVT predictions for PRT per reward size for two different thresholds. Traces are colored by reward size. Dashed lines show two sample thresholds, a and b. Black dots indicate points of threshold crossings for each reward size. Gray box notes examples for Prediction 2 (Top) and Prediction 3 (Bottom). C) Mice show greater differences in mean PRT between 4μL and 2μL patches, compared to 2μL and 1μL patches. D) Sigmoid transformation of decision variable (DV) into probability of patch leave per one-second interval, scaled by different inverse temperature values (light grey = 0.5, dark grey = 1.0, black = 2.0). Red dashed line indicates the maximum P(Leave) output, Pmax. E) Example schematics for DV (Top) and corresponding P(Leave) output (Bottom) for three different integrator models over patches with rewards delivered at t=[0,1,4,5] seconds. DVs ramp upwards over time. Model 1 does not respond to reward deliveries. Model 2 resets to its baseline value following any reward delivery. Model 3 integrates rewards with a constant negative value. Red dashed line indicates the maximum P(Leave) output, Pmax. F) Schematic demonstrating how DV and P(Leave) scale relative to latent patience estimation across patches, for a sample patch in which rewards were delivered 0 and 4 seconds. Black traces indicate patches with more patient estimations and ramp up more slowly. Red traces indicate impulsive latents and yield sharper ramps. G) Relative BIC values for the model fits across subjects. Model 3 yields a superior fit to Models 1 and 2 (p < 0.0001, p = 0.0009, Wilcoxon signed-rank test). H) Example schematics demonstrating how Models 2 (Left) and 3 (Right) make differing predictions for DV (Top) and P(Leave) predictions (Bottom) on patches with rewards at t=[0,2] sec (‘R0R’ patches, black) compared with patches with rewards at t=[0,1,2] sec (‘RRR’ patches, blue). I) Per subject mean Simulated PRT for ‘R0R’ versus ‘RRR’ patches from Model 2 fits (Left), Model 3 fits (Middle), and empirical mice PRT (Right). Model 3 and mice PRTs were significantly higher for ‘RRR’ versus ‘R0R’ trials (p < 0.0001, Model 3; p = 0.0024, Mice; Wilcoxon signed-rank test). There was a small but significant effect of greater PRTs for ‘R0R’ versus ‘RRR’ trials for Model 2 (p = 0.0037). However, this resulted due to a selection bias over latent patience values and was in the opposite direction of the empirical mice data. Points are colored per mouse. Axes are log-scaled. J) Mean PRT across patch types, per subject from Model 3 simulations versus empirical mouse PRT, log-scaled (R2=0.985, over patch type means, Left), points colored per mouse. Instantaneous expected reward rate at patch leave across patch types from population simulations of Model 3 using fit parameters per mouse. K) Schematic demonstrating how integrator models can account for deviations from MVT predictions. Example traces are shown when patience is lower (solid lines) versus higher (dashed lines) for each reward size (colored per μL). Patch leave times are determined by integrator value reaching threshold (black dashed line, dots indicate threshold crossings). Gray box notes example violations of Prediction 2 (Top) and Prediction 3 (Bottom). L) Schematic depicting a sample trial for calculating model-predicted PRT on single trials. Black dots indicate fraction of trials the model predicts subjects would still be on the patch at the start of that one-second time bin (Top). Red dots indicate the probability of leaving, P(Leave), within that time bin. The product of fraction of trials remaining on patch times and P(Leave) determines the corresponding leave density for that time bin (Bottom). The average predicted PRT is then calculated as that trial’s prediction. M) R2 statistics for single trial predictions from cross-validated Model 3 fits across mice (Left, median R2=0.544). Box edges indicate 25th and 75th percentiles. Whiskers stretch out to most extreme points, as none were categorized as outliers. Example mouse single trial Model 3 predicted PRT versus empirical mouse PRT, points colored reward size per patch (Right, R2 = 0.801). Predicted PRT for each trial was calculated using model fit parameters from training folds and compared with PRT over trials from held-out test folds. N) Model 3 predicted PRT and empirical mouse PRT across patches from the example session in left panel of Fig. 2C. Colored dots indicate mouse PRT, colored by reward size. Black dots indicate Model 3 predicted PRT, calculated from fitting on training folds. Lines connecting dots highlight the difference in predicted versus empirical PRT per trial. Red trace indicates latent estimations of patience for each patch. Y-axis is log-scaled.
Figure 4:
Figure 4:. Ramping suppressed by reward is a prominent feature of neural activity and enhanced in frontal cortex.
A) Example histology slice showing probe tracks targeting M2 and OFC. Red: DiI, Purple: DiD. Probe tracks were tilted relative to brain slices, so only part of each probe track is visible in each slice. B) Distribution of recorded brain areas from 9 mice. Areas designated as “Frontal Cortex” are grouped together. C) An example neuron showing ramping activity which was suppressed by reward delivery. Left: PSTH aligned to patch stop, split by reward size (cyan: 1μL, purple: 2μL, magenta: 4μL; trials with reward at t=1 omitted). Middle: PSTH aligned to patch leave, split by reward size. Right: PSTH aligned to patch stop, split by whether reward was delivered at 1 second (red) or not (black); rewards of different size combined. D) Hand-picked principal components (PCs) of neural activity showing reward integrator-like activity. For simplicity, only large reward size (4μL) trials are shown. Magenta/black traces are trials with reward/no reward delivery at t=1. Lines indicate means and shaded area indicates standard error of mean. E) PCs with significant ramping slopes, positive or negative, were identified using a shuffle analysis (see Methods and Fig. S6D). The total variance explained by these PCs was computed (black points, each point represents a session). For a shuffle control (gray points), neural activity traces were randomly rotated relative to task events, and the same analysis was performed, once per session. Because PCs with any non-zero ramping slope were included, not necessarily ramping patterns matching those in panel D, this analysis gives a ceiling on ramping variance. **** P < 0.0001, data versus shuffle, sign rank test (N = 33 sessions). F) Histogram of Pearson’s correlation between smoothed firing rate and the Model 3 decision variable (DV; Fig. 3) for each neuron in our data set. A shuffle distribution was generated separately for each neuron and a z-test was used to identify neurons with significant correlations (P < 0.001). G) R2 values for the regression of Model 3 DV on individual neurons’ firing rates (as in E), by brain region. In the left plot, each point represents a recording session. In the right plot, each point represents a mouse. Consistently across mice, frontal cortex areas had higher R2 values than subcortical areas. ** P < 0.01, paired t-test (N = 9 mice). H) Snippets of several contiguous patches from three example recording sessions, showing the Model 3 DV in black and the cross-validated neural prediction in red. Neural predictions were generated using linear regression on training trials applied to held-out test trials. I) Comparison of Model 3 DV coding between Frontal Cortex and Subcortical Areas. For each within-session comparison, neurons were down-sampled so that each region had the same number of neurons. Only sessions with at least 20 neurons in each of the two brain regions were kept (23/33 sessions). Each pair of data points represents a recording session, and colors represent mice. ** P < 0.01, paired t-test (N = 23 sessions). Frontal Cortex Areas: OFC: Orbitofrontal cortex, ACC: Anterior cingulate cortex, PL: Prelimbic cortex, IL: Infralimbic cortex, M2: Secondary motor cortex, M1: Primary motor cortex. Other Areas: DMS: Dorsomedial striatum, DP: Dorsal peduncular area, LS: Lateral septum, OLF: Olfactory areas, STR: Striatum, TTd: Taenia tecta dorsal part, VS: Ventral striatum.
Figure 5:
Figure 5:. Unsupervised clustering reveals six clusters of neurons, the most prominent of which shows ramping activity and reward responses with opposite signs.
A) Schematic of the analysis approach: A Poisson GLM to estimate task variable coefficients, which are used to cluster neurons using a Gaussian Mixture Model (GMM). B) Z-scored neural activity for all task-related neurons from all brain regions on “40” trials (4μL reward at 0 seconds, no reward at 1 second; left panel) or “44” trials (4μL reward at 0 and 1 second; right panel; white dashed line indicates reward at 1 second). Neurons were sorted based on the time of peak activity on odd 40 trials; the left panel showed the same sort on even 40 trials, and the right panel shows the same sort on all 44 trials. An initial transient reward response (as in Fig. S4A) gradually transitioned into upward ramping activity which was suppressed by reward delivery (as in Fig. 4C). GMM cluster identity for each neuron is indicated on the right. C) GMM clustering on the top 3 PCs of task variable coefficients was used to identify clusters of neural activity patterns. Top left: Bayesian information criterion (BIC) was used to select the number of clusters (minimum BIC: 6 clusters). Top right: Percentage of neurons assigned to each cluster. Clusters were ordered so that patterns with similar shapes but opposite signs were adjacent (see panel E). Bottom: Task-related neurons projected into the PC space used for clustering, colored by assigned cluster. D) Average z-scored GLM task variable coefficients for each cluster. Reward kernel coefficients were multiplied by corresponding basis functions and summed to generate the predicted reward response. E) Average PSTHs of z-scored neural activity following patch stop for each cluster, split by reward size and whether reward was delivered at 1 second (R0 indicates a reward at 0 second, not 1 second, RR indicates rewards at both 0 and 1 second). F) Average PSTHs of z-scored neural activity aligned to patch leave for each cluster, split by reward size.
Figure 6:
Figure 6:. The ramping population (Cluster 1) exhibits ramp-to-threshold activity with reward sensitivity modulated by latent state.
A) Average PSTHs of z-scored neural activity for each cluster, split by patch residence time. Lines indicated means, shaded regions indicate standard error of the mean (N = 568 Cluster 1 neurons). B) Top: Mean reward kernel GLM coefficient for each reward size, by mouse. ****, P < 0.0001, fixed effect of reward size, linear mixed effects model with a random effect per mouse (N = 7 mice). Bottom: Histogram of regression coefficients (β) of mean reward kernel versus reward size, for all Cluster 1 neurons. The example neuron shown in Fig. 4C is indicated. In the mouse-level analyses in panels B-D, two mice with very few task-related neurons were excluded (mice 24 and 39, with 30 and 17 task-related neurons respectively), leaving 7 mice. C) Same as B, but for the slope of ramping activity, extracted from trials with no reward at t=1 (see Fig. S6D). n.s., not significant, fixed effect of reward size, linear mixed effects model with a random effect per mouse (N = 7 mice). D) Total reward (TotalRew) GLM coefficients for Cluster 1 neurons, from GLMs fit to high and low latent trials separately. TotalRew tracks the total number of rewards received on a given patch (see Fig. S5B for example traces). P-values are from paired t-tests on the average coefficient in high and low latent trials by mouse (N = 7 mice; not corrected for multiple comparisons).
Figure 7:
Figure 7:. Cluster 1 population activity is ramping, not stepping, on single trials.
A) Example trial showing ramping activity of simultaneously recorded Cluster 1 neurons while the mouse remained still on the patch. Top: Raster plot of 25 Cluster 1 neurons. Middle: Raster plot of mouse licks. Bottom: Mouse speed (blue) and average firing rate of Cluster 1 neurons (red). Magenta dashed lines: Reward delivery (4μL). B) Schematic of ramp and step models. Models were fit to Cluster 1 neurons. C) Single trial ramping and stepping model fits for an example session (80_20200317). D) Model comparison across sessions. Only sessions with at least 10 Cluster 1 neurons were included (16/33 sessions). For 16/16 sessions, the ramp model outperformed the step model on held out data. E) Ramping reward coefficients and slopes across reward size, by session. Note that reward weights become more negative with reward size (i.e., bigger dip), resulting in a larger decrease in the tendency to leave.

References

    1. Stephens D. W. & Krebs J. R. Foraging Theory. Vol. 1 (Princeton University Press, 1986).
    1. Kacelnik A. Central Place Foraging in Starlings (Sturnus vulgaris). I. Patch Residence Time. Journal of Animal Ecology 53, 283–299, doi: 10.2307/4357 (1984). - DOI
    1. Charnov E. L. Optimal foraging, the marginal value theorem. Theor. Popul. Biol. 9, 129–136, doi: 10.1016/0040-5809(76)90040-x (1976). - DOI - PubMed
    1. Pyke G. H. Optimal Foraging Theory: A Critical Review. Annual Review of Ecology and Systematics 15, 523–575, doi: 10.1146/annurev.es.15.110184.002515 (1984). - DOI
    1. Nonacs P. State dependent behavior and the Marginal Value Theorem. Behavioral Ecology 12, 71–83, doi: 10.1093/oxfordjournals.beheco.a000381 (2001). - DOI

Publication types