Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 9;116(28):13903-13908.
doi: 10.1073/pnas.1821028116. Epub 2019 Jun 24.

Structured, uncertainty-driven exploration in real-world consumer choice

Affiliations

Structured, uncertainty-driven exploration in real-world consumer choice

Eric Schulz et al. Proc Natl Acad Sci U S A. .

Abstract

Making good decisions requires people to appropriately explore their available options and generalize what they have learned. While computational models can explain exploratory behavior in constrained laboratory tasks, it is unclear to what extent these models generalize to real-world choice problems. We investigate the factors guiding exploratory behavior in a dataset consisting of 195,333 customers placing 1,613,967 orders from a large online food delivery service. We find important hallmarks of adaptive exploration and generalization, which we analyze using computational models. In particular, customers seem to engage in uncertainty-directed exploration and use feature-based generalization to guide their exploration. Our results provide evidence that people use sophisticated strategies to explore complex, real-world environments.

Keywords: decision making; exploration; generalization; reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Learning and exploration over time. (A) Average order rating by number of past orders. (B) Probability of sampling a new restaurant in dependency of the number of past orders. Dashed black line indicates simulated exploratory behavior of agents randomly exploring available restaurants. (C) Distribution of order ratings for newly sampled and known restaurants. (D) Average probability of reordering from a restaurant as a function of reward prediction error. Means are displayed as black squares and error bars show the 95% confidence interval of the mean.
Fig. 2.
Fig. 2.
Factors influencing exploration. (A) Effect of relative price. The relative price indicates how much cheaper or more expensive a restaurant was compared with an average restaurant in the same city. (B) Effect of standardized (z-transformed) estimated delivery time. (C) Effect of average rating. (D) Effect of a restaurant’s number of past ratings (certainty). Means are displayed as black squares and error bars show the 95% confidence interval of the mean.
Fig. 3.
Fig. 3.
Signatures of uncertainty-directed exploration. (A) Entropy of the next four choices in dependency of RPE. (B) Probability of reordering from a restaurant in dependency of RPE, shown for restaurants with high and low relative variance. (C) Probability of choosing a novel restaurant in dependency of its difference from an average restaurant within the same cuisine type for restaurants with high and low relative variance. (D) Probability of choosing a novel restaurant in dependency of its relative price for restaurants with high and low relative variance.
Fig. 4.
Fig. 4.
Clusters and changes of exploration. (A) Clusters of exploration between different cuisine types within customers’ consecutive explorations. Green rectangles mark clusters of exploration. (B) Moves between clusters after better-than–expected (positive RPE) and worse-than–expected (negative RPE) outcomes compared with a restaurant-specific mean baseline. Centers of radar plots indicate a change of −5%, and outermost lines indicate a change of +5%. A change of 1% roughly translates to 500 orders.
Fig. 5.
Fig. 5.
Signatures of generalization. (A) Probability of switches between cuisine types and rated similarities between the same types. (B) Average rating per city and proportion of exploratory choices. Turquoise line marks least-squares regression line. (C) Predictability of a restaurant’s quality and average rating of explored restaurants. Turquoise line marks least-squares regression line. (D) Results of model comparison for new customers’ behavior. Considered models were the Bayesian mean tracker (BMT), a Gaussian process with a mean-greedy sampling strategy (GP-M), and a Gaussian process with an upper confidence bound sampling strategy (GP-UCB).

References

    1. Whittle P., Multi-armed bandits and the Gittins index. J. R. Stat. Soc. Ser. B (Methodol.) 42, 143–149 (1980).
    1. Gershman S. J., Deconstructing the human algorithms for exploration. Cognition 173, 34–42 (2018). - PMC - PubMed
    1. Speekenbrink M., Konstantinidis E., Uncertainty and exploration in a restless bandit problem. Top. Cognit. Sci. 7, 351–367 (2015). - PubMed
    1. Frank M. J., Doll B. B., Oas-Terpstra J., Moreno F., Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–1068 (2009). - PMC - PubMed
    1. Auer P., Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2002).

Publication types

LinkOut - more resources