. 2022;23(250):https://www.jmlr.org/papers/v23/21-0354.html.

Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials

Daiqi Gao¹, Yufeng Liu², Donglin Zeng³

Affiliations

¹ Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.
² Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.
³ Department of Biostatistics, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.

PMID: 37576335
PMCID: PMC10419117

Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials

Daiqi Gao et al. J Mach Learn Res. 2022.

. 2022;23(250):https://www.jmlr.org/papers/v23/21-0354.html.

Authors

Daiqi Gao¹, Yufeng Liu², Donglin Zeng³

Affiliations

¹ Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.
² Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.
³ Department of Biostatistics, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.

PMID: 37576335
PMCID: PMC10419117

Abstract

Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.

Keywords: Contextual bandit; empirical process; martingale; outcome weighted learning; sequential decision making.

PubMed Disclaimer

Figures

**Figure 1:**
The randomization probability $P (A_{i} = {\hat{𝒟}}_{i - 1} (X_{i}) ∣ H_{i - 1}, X_{i})$ of SRAT-E, SRAT-B and LinUCB when $ϵ_{i} = 0.05$ and $γ_{i} = 0.4$ .

**Figure 2:**
Scenario 1. The regret (logarithmic scale) and the false decision ratio on the training or test set against sample size $n$ .

**Figure 3:**
The weighted sum of training and test regrets in scenario 1 when $n = 800$ .

**Figure 4:**
Scenario 1 with $ϵ_{0} = 0.5$ . The regret (logarithmic scale) and the false decision ratio on the training or test set against parameter $θ$ .

**Figure 5:**
Sample size consideration for SRAT-E in scenario 1 with $ϵ_{0} = 0.5$ . Correct decision ratios on the test set against that on the training set. Each line represents a sample size $n$ and each point on the line represents a value of $θ$ . Points to the right correspond to smaller $θ$ , and thus lead to higher correct decision ratio on the training set and lower ratio on the test set.

**Figure 6:**
Mean cross-validated HRSD scores against the sample size $n$ .

**Figure 7:**
Scenario 2. The regret (logarithmic scale) and the false decision ratio on the training or test set against sample size $n$ .

See this image and copyright information in PMC

References

1. Auer Peter. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
1. Bae Jongsig and Levental Shlomo. Uniform CLT for Markov chains and its invariance principle: a martingale approach. Journal of Theoretical Probability, 8(3):549–570, 1995.
1. Bartlett Peter L, Jordan Michael I, and McAuliffe Jon D. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
1. Bastani Hamsa and Bayati Mohsen. Online decision making with high-dimensional covariates. Operations Research, 68(1):276–294, 2020.
1. Bousquet Olivier. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematique, 334(6):495–500, 2002.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials

Affiliations

Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous