. 2022 Sep 12;17(9):e0274272.

doi: 10.1371/journal.pone.0274272. eCollection 2022.

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Xijin Chen¹, Kim May Lee², Sofia S Villar¹, David S Robertson¹

Affiliations

¹ MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.
² Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom.

PMID: 36094920
PMCID: PMC9467360
DOI: 10.1371/journal.pone.0274272

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Xijin Chen et al. PLoS One. 2022.

. 2022 Sep 12;17(9):e0274272.

doi: 10.1371/journal.pone.0274272. eCollection 2022.

Authors

Xijin Chen¹, Kim May Lee², Sofia S Villar¹, David S Robertson¹

Affiliations

¹ MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.
² Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom.

PMID: 36094920
PMCID: PMC9467360
DOI: 10.1371/journal.pone.0274272

Abstract

When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Allocation procedure of bandit algorithms.**
Performance of different bandit algorithms over a single simulation (with p₀ = 0.7, p₁ = 0.7, and n = 200) under the null. The two colors reflect different arms that could be regarded as the same under the null.

**Fig 2. Simulation results of E[p*] for TTS, CB and UCB under the null.**
Expectations of 10⁴ replications are taken for CB and UCB and 10³ replications for TTS under different combinations of missingness probabilities, with p₀ = p₁ = 0.9 and n = 200.

**Fig 3. Simulation results under the null.**
Simulation results of E[p*] under the null for different missing data combinations. Grey lines correspond to the case of equal missingness probability in both arms; Blue lines correspond to missingness in the control arm; Red lines correspond to missingness in the experimental arm.

**Fig 4. Simulation results under the alternative.**
Simulation results of E[p*] under the alternative for different missing data combinations. Grey lines correspond to the case of equal missingness probability in both arms; Blue lines correspond to missingness in the control arm; Red lines correspond to missingness in the experimental arm.

**Fig 5. Imputation results under the null.**
Imputation results of E[p*] under the null for different missing data combinations with initial value ${\hat{p}}_{k, 0} = 0.5$ . Grey lines correspond to the case of equal missingness probability in both arms; Blue lines correspond to missingness in the control arm; Red lines correspond to missingness in the experimental arm. Solid lines correspond to the results without mean imputation, while the dashed lines correspond to the results with mean imputation.

**Fig 6. Imputation results under the alternative.**
Imputation results of E[p*] under the alternative for different missing data combinations initial value ${\hat{p}}_{k, 0} = 0.5$ . Grey lines correspond to the case of equal missingness probability in both arms; Blue lines correspond to missingness in the control arm; Red lines correspond to missingness in the experimental arm. Solid lines correspond to the results without mean imputation, while the dashed lines correspond to the results with mean imputation.

See this image and copyright information in PMC

References

1. Chen IY, Joshi S, Ghassemi M, Ranganath R. Probabilistic machine learning for healthcare. Annual Review of Biomedical Data Science. 2020;4. - PubMed
1. Bastani H, Bayati M. Online decision making with high-dimensional covariates. Operations Research. 2020;68(1):276–294. doi: 10.1287/opre.2019.1902 - DOI
1. Scott I, Carter S, Coiera E. Clinician checklist for assessing suitability of machine learning applications in healthcare. BMJ Health & Care Informatics. 2021;28(1). doi: 10.1136/bmjhci-2020-100251 - DOI - PMC - PubMed
1. Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika. 1933;25(3/4):285–294. doi: 10.1093/biomet/25.3-4.285 - DOI
1. Villar SS, Bowden J, Wason J. Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Statistical science: a review journal of the Institute of Mathematical Statistics. 2015;30(2):199. doi: 10.1214/14-STS504 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Affiliations

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources