Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 15;113(20):3458-3475.e12.
doi: 10.1016/j.neuron.2025.07.008. Epub 2025 Aug 7.

Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics

Affiliations

Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics

Michael Bukwich et al. Neuron. .

Abstract

Patch foraging is a ubiquitous decision-making process in which animals decide when to abandon a resource patch of diminishing value to pursue an alternative. We developed a virtual foraging task in which mouse behavior varied systematically with patch value. Behavior could be explained by models integrating time and rewards antagonistically, scaled by a slowly varying latent patience state. Describing a mechanism rather than a normative prescription, these models quantitatively captured deviations from optimal foraging theory. Neuropixels recordings throughout frontal areas revealed distributed ramping signals, concentrated in the frontal cortex, from which multiple integrator models' decision variables could be decoded equally well. These signals reflected key aspects of decision models: they ramped gradually, responded oppositely to time and rewards, were sensitive to patch richness, and retained memory of reward history. Together, these results identify integration via frontal cortex ramping dynamics as a candidate mechanism for solving patch-foraging problems.

Keywords: Marginal Value Theorem; Neuropixels; decision making; foraging; frontal cortex; latent state; neural integration; patch foraging; ramping activity; virtual reality.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Mice foraging times qualitatively match MVT.
A) Schematic of virtual patch-foraging task. B) Combinations of reward sizes and frequencies yield nine patch types (Fig. S1A). C) Probability of reward after one-second intervals per frequency condition. D) Example trials from two mice. Mice stop in response to proximity cues to enter patches and receive stochastic water rewards. Brightness was increased for task state photos. E) Reward deliveries (circles) and patch-leave times (triangles) from example session grouped by patch reward size. Trials are sorted in ascending order of PRT from top to bottom. Circles’ coloring indicates underlying frequency condition. F) PRT per patch type from an example mouse. Column groupings separate patches by reward size. Within columns, patches are split by reward frequency (p<0.0001 for size and frequency, GLM). Box: median ± IQR; whiskers = 5-95%; points = outliers. G) A generalized linear mixed-effects model (GLME) with log link function predicts PRTs across conditions. Points indicate mouse PRTs with error bars showing SEM. Lines represent GLME fits. Colored by mouse ID ‘i’. H) Coefficients from linear regression of normalized PRT on current trial reward size/frequency and average reward size/frequency of the preceding 5 trials. Points show individual mice, and black error bars show mean ± SEM over mice (n=22). Coefficients on current reward size/frequency were positive, whereas coefficients on previous reward size were negative (current reward size coefficient = 0.18 ± 0.02, t-test versus zero p=4.2×10−10; current reward frequency coefficient = 0.084 ± 0.0084, p=2.2×10−9; previous reward size coefficient = −0.029 ± 0.0087, p=0.0033; n=22 mice). Coefficient on previous reward frequency (a noisier indicator of patch quality) did not differ significantly from zero (previous reward frequency coefficient = 0.0032 ± 0.008, p=0.69; n=22 mice). ns, not significant, ** p<0.01, **** p<0.0001, t-test versus zero.
Figure 2:
Figure 2:. Scaling a common function explains PRT variance.
A) Left: Mean PRT per μL across sessions, example mouse. Right: Standard deviation of mean PRT across sessions per subject, colored by mouse ID. B) PRT autocorrelation over patches, mean across subjects. Error bars indicate standard deviation across subjects. C) Estimating latent “patience” state from surrounding PRTs. Left: Example session. Gray trace tracks PRT with colored points indicating reward size for each patch. Black trace indicates estimated latent state (Gaussian filtered PRT, omitting current trial). Middle: Estimated latent state for all sessions from an example mouse. Right: Mean normalized latent estimates across sessions per mouse. Black line shows mean across mice. D) Coefficient of variation (CV) of latent state within versus across sessions per mouse. For within session measures, CV of latents across patches was calculated per session, and mean CV is shown. For across session measures, mean latent was calculated per session, and CV over these means is shown. For the across mice measure, mean latent was calculated per mouse, and CV over these means is shown (red line). E) Generalized linear model (GLM) scaled by latent patience estimates. F) R2 statistics for predicting PRT using: GLME (from Fig. 1G), mean latent per session, per trial latent, and latent-scaled GLM. Ordered by ascending mean R2, stars indicate significance of pairwise Wilcoxon signed-rank tests (p=0.0044, 0.0027, 0.002, Bonferroni-adjusted).
Figure 3:
Figure 3:. Competitive integration processes explain foraging behavior.
A) Instantaneous expected reward rate at time of patch leave across patch types, predicted by idealized MVT (Left), example mouse (Middle, Bonferroni-adjusted p<0.0001 for size, p>0.99 for frequency, 2-way ANOVA), and population (Right, p<0.0001 for size and frequency, linear mixed-effects model). Error bars for example mouse indicate SEM. Error bars for population average indicate standard deviation across mice. B) Schematic demonstrating MVT predictions for PRT per reward size for two different thresholds. Traces are colored by reward size. Dashed lines show two sample thresholds, a and b. Black dots indicate threshold crossing points. Gray box notes examples for Prediction 2 (Top) and Prediction 3 (Bottom). C) Mean PRT differences between (4μL-2μL) versus (2μL-1μL) patches. Dashed line indicates unity. D) Sigmoid transformation of decision variable (DV) into leave probability per one-second interval (P(Leave)/s), scaled by different inverse temperature values (Ψ, shades of gray). Red dashed line indicates Pmax, the maximum P(Leave)/s output. E) Example traces of DV (Top) and corresponding P(Leave)/s output (Bottom) for integrator models over an example patch with rewards at t=[0,1,4,5] seconds. Red dashed line indicates Pmax. F) Schematic demonstrating DV and P(Leave)/s scaling by latent state, for an example patch with rewards at t=[0, 4] seconds. Black/red traces indicate patches with higher/lower latent state estimates and ramp up less/more quickly. G) Relative BIC values for model fits across subjects. H) Schematics demonstrating differing Models 2 (Left) and 3 (Right) predictions on patches with rewards at t=[0,2] sec (‘R0R’ patches, black) versus t=[0,1,2] sec (‘RRR’ patches, blue). I) Per-subject mean simulated PRT for ‘R0R’ versus ‘RRR’ patches from Model 2 (Left), Model 3 (Middle), and empirical mice PRT (Right). Model 3 and mice PRTs were higher for ‘RRR’ versus ‘R0R’ trials (p<0.0001, Model 3; p=0.0024, Mice; Wilcoxon signed-rank test). There was a small but significant effect of greater PRTs for ‘R0R’ versus ‘RRR’ trials for Model 2 (p=0.0037). This was due to selection bias over latent state (‘R0R’ trials tend to have higher latent state than ‘RRR’ trials, lengthening PRT) and was in the opposite direction of the empirical mice data. Points are colored per mouse. Dashed line indicates unity. J) Left: Mean PRT across patch types, per subject, from Model 3 simulations versus empirical mouse PRT, log-scaled, colored per mouse (R2 = 0.985, MSE = 0.413s). Right: Mean instantaneous expected reward rate at time of patch leave from Model 3 simulations (as in ‘A’). K) Schematic demonstrating integrator models can account for deviations from MVT predictions. Example traces are shown when patience is lower (solid lines) versus higher (dashed lines) for each reward size (color). Patch-leave times are determined by integrator value reaching threshold (black dashed line, points indicate threshold crossings). Gray box notes example violations of Prediction 2 (Top) and Prediction 3 (Bottom). L) Mean Model3 simulated PRT differences between (4μL-2μL) versus (2μL-1μL) patches. Dashed line indicates unity. M) Schematic depicting how model-predicted PRT is calculated on a single trial (Methods). N) Left: R2 statistics for single-trial predictions from cross-validated Model 3 fits across mice (Left, median R2=0.544). Box: median ± IQR; whiskers = 5-95%. Right: Single-trial Model 3-predicted PRT (cross-validated) versus true PRT for an example mouse (R2 = 0.801). Colors indicate reward size per patch. O) Model 3-predicted PRT and empirical mouse PRT across patches from example session in Fig. 2C. Colored points indicate mouse PRT per reward size. Black points indicate Model 3 cross-validated predicted PRT. Lines connecting points highlight the difference in predicted versus empirical PRT. Gray trace indicates latent state estimate for each patch.
Figure 4:
Figure 4:. Reward-suppressed ramps are prevalent in frontal cortex.
A) Example histology slice. B) Distribution of recorded brain areas. C) Example neuron with ramping activity suppressed by reward delivery. Left: PSTH aligned to patch stop, split by reward size (trials with reward at t=1 omitted). Middle: PSTH aligned to patch leave, split by reward size. Right: PSTH aligned to patch stop, split by whether reward was delivered at 1 second (red) or not (black) (rewards of different size combined). D) Hand-picked principal components (PCs) of neural activity showing integrator-like activity. For simplicity, only 4 μL trials are shown. Magenta/black traces are trials with/without reward at t=1. Lines and shaded area = mean ± SEM over trials. E) Total variance explained by PCs with significant ramping slopes (black) versus a shuffle control (gray). **** p<0.0001, data versus shuffle, sign rank test (n=33 sessions). F) Histogram of correlation between single neuron firing rates and Model 3 DV (Fig. 3). Red/black indicate neurons with/without significant correlation versus shuffle control (z-test p<0.001). G) R2 between individual neurons’ firing rates and the Model 3 DV, by brain region. Points represent recording sessions (left) or mice (right). Frontal cortex areas had higher mean R2 values than subcortical areas (p<0.01, paired t-test, n=9 mice). Frontal Cortex Areas: OFC: Orbitofrontal cortex, ACC: Anterior cingulate cortex, PL: Prelimbic cortex, IL: Infralimbic cortex, M2: Secondary motor cortex, M1: Primary motor cortex. Subcortical Areas: DMS: Dorsomedial striatum, DP: Dorsal peduncular area, LS: Lateral septum, OLF: Olfactory areas, STR: Striatum, TTd: Taenia tecta dorsal part, VS: Ventral striatum.
Figure 5:
Figure 5:. DV decoder output shares features of Models 2 and 3.
A) Schematic of decision variable (DV) decoding. B) Decoder output (red) versus true Model 3 DV (black) for several contiguous patches from example recording sessions. C) CV R2 for Model DVs (Models 1-3, Fig. 3), by Session (n=28, left), or Mouse (n=9, right). 5/33 sessions with CV R2<-0.1 were excluded from further analysis. n.s. Not Significant (p>0.05, one-way ANOVA). D) Within-session correlations between DVs for pairs of models, averaged across sessions per mouse. E) Same as (D), but for neural predictions (decoder outputs). F) Same as (D), but for neural regression coefficients. G) Comparison of R2 between decoders using only units from frontal cortex (Ctx) or subcortical areas (Sub). Each line represents a recording session, and colors represent mice. Black lines and error bars = mean ± SEM over sessions. ** p<0.01, *** p<0.001, paired t-test (n=23 sessions). H) PSTHs of DVs (left) and neural predictions (right; decoder output trained on those DVs) on trials with no reward at t=1, which were used to estimate ramping slope. Lines and shaded areas = means ± SEMs over sessions. Colors indicate reward size. In panels H-P, results are shown for behavioral models 1-3, arranged from top to bottom. I) Estimated ramping slope for DVs (left) and neural predictions (right). Colored lines show individual mice (slopes averaged over sessions). Black lines and error bars show means ± SEM over mice. ** p<0.01, *** p<0.001, reward size coefficient, LME. J) Same as (H), but aligned to reward deliveries, excluding rewards at t=0. K) Reward responses of DVs (left) and neural predictions (right). Significance level reflects the reward size coefficient in a linear model. For neural predictions, average response across reward sizes is also shown (“Avg”); stars indicate significance of a t-test versus zero (n=9 mice). N.S. Not Significant (p>0.05), * p<0.05, ** p<0.01, *** p<0.001. L) Slope from linear regression of reward response on pre-reward level of DVs (left) and neural predictions from decoders (right). For all three decoders, slope was approximately −0.5 for all reward sizes, indicating partial resetting. M) PSTHs of DVs on trial types isolating the effect of reward history: trials with rewards at t=0 and 2 seconds (‘R0R’) versus trials with rewards at t=0,1, and 2 seconds (‘RRR’). Lines and shaded areas indicate means ± SEMs over sessions. Trials are split by reward size and R0R vs RRR reward sequences, with lighter shades indicating R0R trials and darker shades indicating RRR trials. N) Same as (M) but showing neural predictions on R0R versus RRR trials. O) Comparison of the DV level just after the reward at t=2 seconds between R0R (x-axis) and RRR (y-axis) trials for each reward size. P-values for TrialType (R0R vs. RRR) in an LME are shown for each behavioral model. By construction, only Model 3 has a significant TrialType coefficient, indicating sensitivity to reward history. P) Same as (O), but for neural predictions. All decoders had significant TrialType coefficients, indicating a reward history effect on decoder output most consistent with Model 3.
Figure 6:
Figure 6:. Functional clustering reveals module pairs with reciprocal integration dynamics
A) Schematic of analysis approach: Task variable coefficients estimated via Poisson GLM are used to cluster neurons using a Gaussian Mixture Model (GMM). B) GMM clustering to identify clusters of neural activity patterns. Top left: BIC was used to select the number of clusters (minimum BIC: 6 clusters). Top right: Percentage of neurons assigned to clusters. Clusters were ordered so patterns with similar shapes but opposite signs were adjacent (see panel E). Bottom: Task-related neurons projected into the PC space used for clustering. C) Neural activity for task-related neurons on “40” trials (4μL reward at 0 seconds, no reward at 1 second; left panel) or “44” trials (4μL reward at 0 and 1 second; right panel; white dashed line indicates reward at 1 second). GMM cluster identity for each neuron is indicated on the right. D) Average GLM-predicted reward responses and z-scored accumulator coefficients. Z-scored reward kernel coefficients were multiplied by corresponding basis functions and summed to generate the predicted reward response. In panels D-H, lines indicated means, and shaded regions indicate SEM. E) Average PSTHs of z-scored neural activity following patch stop for each cluster, split by reward size and whether or not reward was delivered at 1 second. F) Average PSTHs of z-scored neural activity aligned to patch leave for each cluster. G) Average PSTHs of z-scored neural activity for each cluster, aligned to patch stop and split by patch residence time. H) Same as (G) but aligned to patch leave. I) Average coefficient (β) per neuron in linear DV decoding from Models 1-3 (top to bottom; decoding as in Fig. 5), by GMM cluster. Coefficients were averaged over neurons within session first, then averaged over sessions. Lines indicate means and error bars indicate SEM over sessions (n=28 sessions). Coefficients differed by GMM cluster for all three models (LME with fixed effect of cluster and random effect per session; Model 1, p=0.0044; Model 2, p=0.0033; Model 3, p=0.021; n=28 sessions). ** p<0.01, * p<0.05. J) Same as (I), but for absolute value of decoder coefficients (|β|), a measure of overall contribution to the decoder. |β| did not differ between GMM clusters for any of the three models (Model 1, p=0.15; Model 2, p=0.94; Model 3, p=0.86; n=28 sessions). N.S. Not Significant.
Figure 7:
Figure 7:. Functional clusters exhibit ramping activity.
A) Example trial showing simultaneously recorded Cluster 1 neuron activity. Top: Raster plot of Cluster 1 neurons (n=25). Middle: Raster plot of mouse licks. Bottom: Mouse speed (blue) and average firing rate of Cluster 1 neurons (red). Magenta dashed lines: Reward delivery (4 μL). B) Schematic of ramp and step models. C) Single-trial ramping and stepping model fits for Cluster 1 neurons from an example session (80_20200317). D) Model comparison per GMM Cluster. In panels D and E, data points indicate sessions, colors indicate mice, and error bars indicate mean ± SEM across sessions. Stars indicate significance level of a t-test versus zero per GMM Cluster. *** p<0.001, ** p<0.01, * p<0.05, n.s. Not Significant. LL: Log likelihood. E) Reward coefficients and ramping slopes across reward size per GMM cluster.

Update of

References

    1. Stephens DW, and Krebs JR (1986). Foraging Theory (Princeton University Press; ). 10.2307/j.ctvs32s6b. - DOI
    1. Kacelnik A (1984). Central Place Foraging in Starlings (Sturnus vulgaris). I. Patch Residence Time. Journal of Animal Ecology 53, 283–299. 10.2307/4357. - DOI
    1. Charnov EL (1976). Optimal foraging, the marginal value theorem. Theor. Popul. Biol 9, 129–136. 10.1016/0040-5809(76)90040-x. - DOI - PubMed
    1. Pyke GH (1984). Optimal Foraging Theory: A Critical Review. Annual Review of Ecology and Systematics 15, 523–575. 10.1146/annurev.es.15.110184.002515. - DOI
    1. Nonacs P (2001). State dependent behavior and the Marginal Value Theorem. Behavioral Ecology 12, 71–83. 10.1093/oxfordjournals.beheco.a000381. - DOI

LinkOut - more resources