Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 21;10(6):230157.
doi: 10.1098/rsos.230157. eCollection 2023 Jun.

Signal detection models as contextual bandits

Affiliations

Signal detection models as contextual bandits

Thomas N Sherratt et al. R Soc Open Sci. .

Abstract

Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal-normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.

Keywords: Softmax; Thompson sampling; contextual bandit; decision theory; multi-armed bandit; signal detection theory.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

Figure 1.
Figure 1.
The average probabilities (calculated over 100 separate simulations) of acceptance of a desirable signaller of type A (red) and undesirable signaller of type B (blue) by a receiver that employs (first row) TS and (second row) Softmax (with τ = 0.1) as an exploration–exploitation heuristic over 200 encounters. Here type A and type B have normally distributed appearances with means μA and μB respectively, and standard deviation σ. Model parameters: b = 2, c = 1, μA = −1, μB = 1, σ = 1, ρ = 0.6. The continuous red and blue lines represent the proportions of each signaller type A and B that should be accepted on encounter, given the SDT-predicted optimal threshold. Over time the receivers learn the relationship between a signaller's appearance and its log odds of being desirable and ultimately accept approximately the SDT-predicted proportions of signallers. The histograms show the distributions of the thresholds below which it is profitable to accept a signaller after 200 encounters (although neither of the heuristics use this threshold directly, TS will ultimately behave as if it does). See electronic supplementary material, figure S3, for the full posterior estimates of the logistic distribution after a given number of encounters.
Figure 2.
Figure 2.
The mean cumulative payoff (with 95% CI) gained by volunteers (IND) in the experiments (four treatments comprising two levels of discriminability and two base rates) after 50 encounters with signallers. For comparative purposes, we show the mean cumulative payoff (with 95% CI) after 50 encounters with signallers for a range of exploration–exploitation heuristics, calculated over 100 separate simulations under the same mean conditions. TS, Thompson sampling; SOFT, Softmax (τ = 0.1); SDT, signal detection theory threshold; RDM, accept with 50% probability; MID, Midpoint; UCB, upper confidence bound (γ = 0.05); GR, Greedy; EGR, ε-Greedy (ε = 0.05).
Figure 3.
Figure 3.
The predicted mean probability of accepting a signaller under the Softmax heuristic (binned in 0.1 intervals; 0–0.099, 0.1–0.199, …, 0.9–1) against the observed proportion of signallers accepted by volunteers for this prediction interval. The fitted lines are linear regressions. The Softmax predictions were obtained by fitting the Softmax model to the history of signallers accepted in each of the four treatments (assuming ε = 0). The posterior means of the temperature parameter τ under these conditions were as follows: low base rate, low discriminability, τ = 0.9 (s.e. 0.22); high base rate, low discriminability, τ = 0.38 (s.e. 0.04); low base rate, high discriminability, τ = 0.48 (s.e. 0.04); high base rate, high discriminability, τ = 0.37 (s.e. 0.03).

Similar articles

Cited by

References

    1. Green DM, Swets JA. 1988. Signal detection theory and psychophysics. Los Altos, CA: Peninsula Publishing.
    1. Egan JP. 1975. Signal detection theory and ROC analysis. New York, NY: Academic Press.
    1. Lynn SK, Barrett LF. 2014. ‘Utilizing’ signal detection theory. Psychol. Sci. 25, 1663-1673. (10.1177/0956797614541991) - DOI - PMC - PubMed
    1. Johnstone RA. 2002. The evolution of inaccurate mimics. Nature 418, 524-526. (10.1038/nature00845) - DOI - PubMed
    1. Staddon JER, Gendron RP. 1983. Optimal detection of cryptic prey may lead to predator switching. Am. Nat. 122, 843-848. (10.1086/284179) - DOI