Signal detection models as contextual bandits
- PMID: 37351497
- PMCID: PMC10282591
- DOI: 10.1098/rsos.230157
Signal detection models as contextual bandits
Abstract
Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal-normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.
Keywords: Softmax; Thompson sampling; contextual bandit; decision theory; multi-armed bandit; signal detection theory.
© 2023 The Authors.
Conflict of interest statement
The authors have no competing interests to declare.
Figures



Similar articles
-
An empirical evaluation of active inference in multi-armed bandits.Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26. Neural Netw. 2021. PMID: 34507043
-
A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials.Life (Basel). 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277. Life (Basel). 2022. PMID: 36013456 Free PMC article.
-
Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models.Entropy (Basel). 2023 Jan 18;25(2):188. doi: 10.3390/e25020188. Entropy (Basel). 2023. PMID: 36832555 Free PMC article.
-
Multi-Armed Bandits in Brain-Computer Interfaces.Front Hum Neurosci. 2022 Jul 5;16:931085. doi: 10.3389/fnhum.2022.931085. eCollection 2022. Front Hum Neurosci. 2022. PMID: 35874164 Free PMC article. Review.
-
Risk management frameworks for human health and environmental risks.J Toxicol Environ Health B Crit Rev. 2003 Nov-Dec;6(6):569-720. doi: 10.1080/10937400390208608. J Toxicol Environ Health B Crit Rev. 2003. PMID: 14698953 Review.
Cited by
-
Who innovates? Abundance of novel and familiar food changes which animals are most persistent.Proc Biol Sci. 2024 Jan 31;291(2015):20231936. doi: 10.1098/rspb.2023.1936. Epub 2024 Jan 17. Proc Biol Sci. 2024. PMID: 38228174 Free PMC article.
References
-
- Green DM, Swets JA. 1988. Signal detection theory and psychophysics. Los Altos, CA: Peninsula Publishing.
-
- Egan JP. 1975. Signal detection theory and ROC analysis. New York, NY: Academic Press.
-
- Staddon JER, Gendron RP. 1983. Optimal detection of cryptic prey may lead to predator switching. Am. Nat. 122, 843-848. (10.1086/284179) - DOI
Associated data
LinkOut - more resources
Full Text Sources
Research Materials