Empirical power comparison of statistical tests in contemporary phase III randomized controlled trials with time-to-event outcomes in oncology
- PMID: 32933339
- DOI: 10.1177/1740774520940256
Empirical power comparison of statistical tests in contemporary phase III randomized controlled trials with time-to-event outcomes in oncology
Abstract
Background: More than 95% of recent cancer randomized controlled trials used the log-rank test to detect a treatment difference making it the predominant tool for comparing two survival functions. As with other tests, the log-rank test has both advantages and disadvantages. One advantage is that it offers the highest power against proportional hazards differences, which may be a major reason why alternative methods have rarely been employed in practice. The performance of statistical tests has traditionally been investigated both theoretically and numerically for several patterns of difference between two survival functions. However, to the best of our knowledge, there has been no attempt to compare the performance of various statistical tests using empirical data from past oncology randomized controlled trials. So, it is unknown whether the log-rank test offers a meaningful power advantage over alternative testing methods in contemporary cancer randomized controlled trials. Focusing on recently reported phase III cancer randomized controlled trials, we assessed whether the log-rank test gave meaningfully greater power when compared with five alternative testing methods: generalized Wilcoxon, test based on maximum of test statistics from multiple weighted log-rank tests, difference in t-year event rate, and difference in restricted mean survival time with fixed and adaptive .
Methods: Using manuscripts from cancer randomized controlled trials recently published in high-tier clinical journals, we reconstructed patient-level data for overall survival (69 trials) and progression-free survival (54 trials). For each trial endpoint, we estimated the empirical power of each test. Empirical power was measured as the proportion of trials for which a test would have identified a significant result (p value < .05).
Results: For overall survival, t-year event rate offered the lowest (30.4%) empirical power and restricted mean survival time with fixed offered the highest (43.5%). The empirical power of the other types of tests was almost identical (36.2%-37.7%). For progression-free survival, the tests we investigated offered numerically equivalent empirical power (55.6%-61.1%). No single test consistently outperformed any other test.
Conclusion: The empirical power assessment with the past cancer randomized controlled trials provided new insights on the performance of statistical tests. Although the log-rank test has been used in almost all trials, our study suggests that the log-rank test is not the only option from an empirical power perspective. Near universal use of the log-rank test is not supported by a meaningful difference in empirical power. Clinical trial investigators could consider alternative methods, beyond the log-rank test, for their primary analysis when designing a cancer randomized controlled trial. Factors other than power (e.g. interpretability of the estimated treatment effect) should garner greater consideration when selecting statistical tests for cancer randomized controlled trials.
Keywords: Hazard ratio; log-rank test; restricted mean survival time; survival data analysis; weighted log-rank test.
Similar articles
-
Designing clinical trials with (restricted) mean survival time endpoint: Practical considerations.Clin Trials. 2020 Jun;17(3):285-294. doi: 10.1177/1740774520905563. Epub 2020 Feb 17. Clin Trials. 2020. PMID: 32063031
-
Log-Rank Test vs MaxCombo and Difference in Restricted Mean Survival Time Tests for Comparing Survival Under Nonproportional Hazards in Immuno-oncology Trials: A Systematic Review and Meta-analysis.JAMA Oncol. 2022 Sep 1;8(9):1294-1300. doi: 10.1001/jamaoncol.2022.2666. JAMA Oncol. 2022. PMID: 35862037 Free PMC article.
-
Power and sample size for randomized phase III survival trials under the Weibull model.J Biopharm Stat. 2015;25(1):16-28. doi: 10.1080/10543406.2014.919940. J Biopharm Stat. 2015. PMID: 24895942 Free PMC article.
-
Visualizing hypothesis tests in survival analysis under anticipated delayed effects.Pharm Stat. 2024 Nov-Dec;23(6):870-883. doi: 10.1002/pst.2393. Epub 2024 May 6. Pharm Stat. 2024. PMID: 38708672
-
Are non-constant rates and non-proportional treatment effects accounted for in the design and analysis of randomised controlled trials? A review of current practice.BMC Med Res Methodol. 2019 May 16;19(1):103. doi: 10.1186/s12874-019-0749-1. BMC Med Res Methodol. 2019. PMID: 31096924 Free PMC article. Review.
Cited by
-
A Biomarker Signature-Guided Clinical Trial Design for Precision Medicine.Stat Med. 2025 May;44(10-12):e70103. doi: 10.1002/sim.70103. Stat Med. 2025. PMID: 40405471 Free PMC article.
-
The Impact of Urate-Lowering Therapy in Post-Myocardial Infarction Patients: Insights From a Population-Based, Propensity Score-Matched Analysis.Clin Pharmacol Ther. 2022 Mar;111(3):655-663. doi: 10.1002/cpt.2473. Epub 2021 Nov 17. Clin Pharmacol Ther. 2022. PMID: 34719019 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous