Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;30(2):199-215.
doi: 10.1214/14-STS504.

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges

Affiliations

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges

Sofía S Villar et al. Stat Sci. 2015.

Abstract

Multi-armed bandit problems (MABPs) are a special type of optimal control problem well suited to model resource allocation under uncertainty in a wide variety of contexts. Since the first publication of the optimal solution of the classic MABP by a dynamic index rule, the bandit literature quickly diversified and emerged as an active research topic. Across this literature, the use of bandit models to optimally design clinical trials became a typical motivating application, yet little of the resulting theory has ever been used in the actual design and analysis of clinical trials. To this end, we review two MABP decision-theoretic approaches to the optimal allocation of treatments in a clinical trial: the infinite-horizon Bayesian Bernoulli MABP and the finite-horizon variant. These models possess distinct theoretical properties and lead to separate allocation rules in a clinical trial design context. We evaluate their performance compared to other allocation rules, including fixed randomization. Our results indicate that bandit approaches offer significant advantages, in terms of assigning more patients to better treatments, and severe limitations, in terms of their resulting statistical power. We propose a novel bandit-based patient allocation rule that overcomes the issue of low power, thus removing a potential barrier for their use in practice.

Keywords: Gittins index; Multi-armed bandit; Whittle index; patient allocation; response adaptive procedures.

PubMed Disclaimer

Figures

FIG. 1
FIG. 1
The number of individual computations for an approximation to the optimal rule in a particular instance of the Bayesian Bernoulli MABP as a function of T with K = 3 and d = 0.9 for the Brute force, DP and Gittins index approaches.
FIG. 2
FIG. 2
The (approximate) Gittins index values for an information vector of s0+st successes and f0+ft failures, where d = 0.99 and T is truncated at T = 750.
FIG. 3
FIG. 3
The (approximate) Whittle index values for an information vector of s0 + st successes and f0 + ft failures, plotted for Tt ∈ {1, 40, 80} with d = 1 and T = 180.
FIG. 4
FIG. 4
Top: The bias in the control treatment estimate as a function of the number of allocated patients under H1. Bottom: The bias in the experimental treatment estimate under H1.

References

    1. Armitage P. The search for optimality in clinical trials. Internat Statist Rev. 1985;53:15–24. MR0959040
    1. Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning. 2002;47:235–256.
    1. Barker AD, Sigman CC, Kelloff GJ, Hylton NM, Berry DA, Esserman LJ. I-SPY 2: An adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy. Clinical Pharmacology & Therapeutics. 2009;86:97–100. - PubMed
    1. Bather JA. Randomized allocation of treatments in sequential experiments. J R Stat Soc Ser B Stat Methodol. 1981;43:265–292. MR0637940
    1. Beale E. Contribution to the discussion of Gittins. J R Stat Soc Ser B Stat Methodol. 1979;41:171–172.

LinkOut - more resources