Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 12;29(10):1737-1743.
doi: 10.1093/jamia/ocac106.

In with the old, in with the new: machine learning for time to event biomedical research

Affiliations

In with the old, in with the new: machine learning for time to event biomedical research

Ioana Danciu et al. J Am Med Inform Assoc. .

Erratum in

Abstract

The predictive modeling literature for biomedical applications is dominated by biostatistical methods for survival analysis, and more recently some out of the box machine learning approaches. In this article, we show a presentation of a machine learning method appropriate for time-to-event modeling in the area of prostate cancer long-term disease progression. Using XGBoost adapted to long-term disease progression, we developed a predictive model for 118 788 patients with localized prostate cancer at diagnosis from the Department of Veterans Affairs (VA). Our model accounted for patient censoring. Harrell's c-index for our model using only features available at the time of diagnosis was 0.757 95% confidence interval [0.756, 0.757]. Our results show that machine learning methods like XGBoost can be adapted to use accelerated failure time (AFT) with censoring to model long-term risk of disease progression. The long median survival justifies and requires censoring. Overall, we show that an existing machine learning approach can be used for AFT outcome modeling in prostate cancer, and more generally for other chronic diseases with long observation times.

Keywords: machine learning; predictive modeling; survival analysis; xgboost.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Cohort selection.
Figure 2.
Figure 2.
SHAP predictor importance (average impact on model output magnitude).
Figure 3.
Figure 3.
(A) Instance level predictor importance for hypothetical patient 1 with more aggressive disease at diagnosis—Hypothetical patient 1 timeline. (B) Instance level predictor importance for hypothetical patient 1 with more aggressive disease at diagnosis—Outcome prediction. (C) Instance level predictor importance for hypothetical patient 2 with less aggressive disease at diagnosis—Hypothetical patient 2 timeline. (D) Instance level predictor importance for hypothetical patient 2 with less aggressive disease at diagnosis—Outcome prediction.

References

    1. Goecks J, Jalili V, Heiser LM, Gray JW.. How machine learning will transform biomedicine. Cell 2020; 181 (1): 92–101. - PMC - PubMed
    1. Barnwal A, Cho H, Hocking TD. Survival Regression with Accelerated Failure Time Model in XGBoost. ArXiv200604920 Cs Stat, 2020. http://arxiv.org/abs/2006.04920 Accessed March 05, 2021.
    1. Zullig LL, Sims KJ, McNeil R, et al.Cancer incidence among patients of the United States veterans affairs (VA) healthcare system: 2010 update. Mil Med 2017; 182 (7): e1883–91. - PMC - PubMed
    1. CDCBreastCancer. Prostate Cancer Statistics. Centers for Disease Control and Prevention, 2021. https://www.cdc.gov/cancer/prostate/statistics/index.htm Accessed August 14, 2021.
    1. Wilt TJ, Jones KM, Barry MJ, et al.Follow-up of prostatectomy versus observation for early prostate cancer. N Engl J Med 2017; 377 (2): 132–42. - PubMed

Publication types