Signal detection models as contextual bandits

Thomas N Sherratt¹, Erica O'Neill¹

Affiliations

PMID: 37351497
PMCID: PMC10282591
DOI: 10.1098/rsos.230157

Signal detection models as contextual bandits

Thomas N Sherratt et al. R Soc Open Sci. 2023.

. 2023 Jun 21;10(6):230157.

doi: 10.1098/rsos.230157. eCollection 2023 Jun.

Authors

Thomas N Sherratt¹, Erica O'Neill¹

Affiliation

¹ Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6.

PMID: 37351497
PMCID: PMC10282591
DOI: 10.1098/rsos.230157

Abstract

Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal-normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.

Keywords: Softmax; Thompson sampling; contextual bandit; decision theory; multi-armed bandit; signal detection theory.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

**Figure 1.**
The average probabilities (calculated over 100 separate simulations) of acceptance of a desirable signaller of type A (red) and undesirable signaller of type B (blue) by a receiver that employs (first row) TS and (second row) Softmax (with τ = 0.1) as an exploration–exploitation heuristic over 200 encounters. Here type A and type B have normally distributed appearances with means *μ_A* and *μ_B* respectively, and standard deviation σ. Model parameters: b = 2, c = 1, *μ_A* = −1, *μ_B* = 1, σ = 1, ρ = 0.6. The continuous red and blue lines represent the proportions of each signaller type A and B that should be accepted on encounter, given the SDT-predicted optimal threshold. Over time the receivers learn the relationship between a signaller's appearance and its log odds of being desirable and ultimately accept approximately the SDT-predicted proportions of signallers. The histograms show the distributions of the thresholds below which it is profitable to accept a signaller after 200 encounters (although neither of the heuristics use this threshold directly, TS will ultimately behave as if it does). See electronic supplementary material, figure S3, for the full posterior estimates of the logistic distribution after a given number of encounters.

**Figure 2.**
The mean cumulative payoff (with 95% CI) gained by volunteers (IND) in the experiments (four treatments comprising two levels of discriminability and two base rates) after 50 encounters with signallers. For comparative purposes, we show the mean cumulative payoff (with 95% CI) after 50 encounters with signallers for a range of exploration–exploitation heuristics, calculated over 100 separate simulations under the same mean conditions. TS, Thompson sampling; SOFT, Softmax (τ = 0.1); SDT, signal detection theory threshold; RDM, accept with 50% probability; MID, Midpoint; UCB, upper confidence bound (γ = 0.05); GR, Greedy; EGR, ε-Greedy (ε = 0.05).

**Figure 3.**
The predicted mean probability of accepting a signaller under the Softmax heuristic (binned in 0.1 intervals; 0–0.099, 0.1–0.199, …, 0.9–1) against the observed proportion of signallers accepted by volunteers for this prediction interval. The fitted lines are linear regressions. The Softmax predictions were obtained by fitting the Softmax model to the history of signallers accepted in each of the four treatments (assuming ε = 0). The posterior means of the temperature parameter τ under these conditions were as follows: low base rate, low discriminability, τ = 0.9 (s.e. 0.22); high base rate, low discriminability, τ = 0.38 (s.e. 0.04); low base rate, high discriminability, τ = 0.48 (s.e. 0.04); high base rate, high discriminability, τ = 0.37 (s.e. 0.03).

See this image and copyright information in PMC

Cited by

Who innovates? Abundance of novel and familiar food changes which animals are most persistent.
Kikuchi DW. Kikuchi DW. Proc Biol Sci. 2024 Jan 31;291(2015):20231936. doi: 10.1098/rspb.2023.1936. Epub 2024 Jan 17. Proc Biol Sci. 2024. PMID: 38228174 Free PMC article.

References

1. Green DM, Swets JA. 1988. Signal detection theory and psychophysics. Los Altos, CA: Peninsula Publishing.
1. Egan JP. 1975. Signal detection theory and ROC analysis. New York, NY: Academic Press.
1. Lynn SK, Barrett LF. 2014. ‘Utilizing’ signal detection theory. Psychol. Sci. 25, 1663-1673. (10.1177/0956797614541991) - DOI - PMC - PubMed
1. Johnstone RA. 2002. The evolution of inaccurate mimics. Nature 418, 524-526. (10.1038/nature00845) - DOI - PubMed
1. Staddon JER, Gendron RP. 1983. Optimal detection of cryptic prey may lead to predator switching. Am. Nat. 122, 843-848. (10.1086/284179) - DOI

Associated data

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Signal detection models as contextual bandits

Affiliation

Signal detection models as contextual bandits

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Associated data

LinkOut - more resources

Full Text Sources

Research Materials