Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 1;58(12):874-881.
doi: 10.1097/RLI.0000000000001009. Epub 2023 Jul 28.

A Comprehensive Machine Learning Benchmark Study for Radiomics-Based Survival Analysis of CT Imaging Data in Patients With Hepatic Metastases of CRC

Affiliations

A Comprehensive Machine Learning Benchmark Study for Radiomics-Based Survival Analysis of CT Imaging Data in Patients With Hepatic Metastases of CRC

Anna Theresa Stüber et al. Invest Radiol. .

Abstract

Objectives: Optimizing a machine learning (ML) pipeline for radiomics analysis involves numerous choices in data set composition, preprocessing, and model selection. Objective identification of the optimal setup is complicated by correlated features, interdependency structures, and a multitude of available ML algorithms. Therefore, we present a radiomics-based benchmarking framework to optimize a comprehensive ML pipeline for the prediction of overall survival. This study is conducted on an image set of patients with hepatic metastases of colorectal cancer, for which radiomics features of the whole liver and of metastases from computed tomography images were calculated. A mixed model approach was used to find the optimal pipeline configuration and to identify the added prognostic value of radiomics features.

Materials and methods: In this study, a large-scale ML benchmark pipeline consisting of preprocessing, feature selection, dimensionality reduction, hyperparameter optimization, and training of different models was developed for radiomics-based survival analysis. Portal-venous computed tomography imaging data from a previous prospective randomized trial evaluating radioembolization of liver metastases of colorectal cancer were quantitatively accessible through a radiomics approach. One thousand two hundred eighteen radiomics features of hepatic metastases and the whole liver were calculated, and 19 clinical parameters (age, sex, laboratory values, and treatment) were available for each patient. Three ML algorithms-a regression model with elastic net regularization (glmnet), a random survival forest (RSF), and a gradient tree-boosting technique (xgboost)-were evaluated for 5 combinations of clinical data, tumor radiomics, and whole-liver features. Hyperparameter optimization and model evaluation were optimized toward the performance metric integrated Brier score via nested cross-validation. To address dependency structures in the benchmark setup, a mixed-model approach was developed to compare ML and data configurations and to identify the best-performing model.

Results: Within our radiomics-based benchmark experiment, 60 ML pipeline variations were evaluated on clinical data and radiomics features from 491 patients. Descriptive analysis of the benchmark results showed a preference for RSF-based pipelines, especially for the combination of clinical data with radiomics features. This observation was supported by the quantitative analysis via a linear mixed model approach, computed to differentiate the effect of data sets and pipeline configurations on the resulting performance. This revealed the RSF pipelines to consistently perform similar or better than glmnet and xgboost. Further, for the RSF, there was no significantly better-performing pipeline composition regarding the sort of preprocessing or hyperparameter optimization.

Conclusions: Our study introduces a benchmark framework for radiomics-based survival analysis, aimed at identifying the optimal settings with respect to different radiomics data sources and various ML pipeline variations, including preprocessing techniques and learning algorithms. A suitable analysis tool for the benchmark results is provided via a mixed model approach, which showed for our study on patients with intrahepatic liver metastases, that radiomics features captured the patients' clinical situation in a manner comparable to the provided information solely from clinical parameters. However, we did not observe a relevant additional prognostic value obtained by these radiomics features.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest and sources of funding: none declared.

Figures

FIGURE 1
FIGURE 1
Study flowchart. For 491 of the 530 patients, computed tomography (CT) imaging data were available for the study at hand.
FIGURE 2
FIGURE 2
A, Setup of the benchmark pipeline for the analysis of prognostic information in different (radiomics) feature sets regarding overall survival in patients with hepatic metastases due to colorectal cancer. The basis of the machine learning pipelines is given by the 3 algorithms—a random survival forest (RSF), a regularized generalized linear model (glmnet), and a gradient tree-boosting technique (xgboost)—combined with varying configurations of preprocessing (feature selection, principal component analysis yes/no) and tuning yes/no, evaluated with respect to the performance metric integrated Brier score (IBS). B, Detailed look into the tuning (k = 5)/performance (k = 10) loop for k-fold cross-validation (CV). Preprocessing steps in the pipeline construction and the corresponding learner development are based on the training folds, whereas model evaluation is based on the test fold.
FIGURE 3
FIGURE 3
A, Exemplary portal-venous CT image of a patient with metastases (green) in the liver (yellow) of colorectal cancer. B, Pearson correlation map of the first 100 radiomics features obtained from the (original/no-filter) liver CTs.
FIGURE 4
FIGURE 4
Benchmark results presented via the IBS (pointwise per CV fold; distribution by boxplots) per algorithm in grids of tuning and preprocessing (principal component analysis yes/no) configuration. Different colors mark the performance on each (radiomics) data set. Low IBS values indicate good performance and can especially be seen for the tuned RSF, with the best performance score marked via the dashed orange line. xgboost showed poor performance for its untuned version and is therefore presented on an adapted IBS scale.
FIGURE 5
FIGURE 5
Interaction-style plot with ◇ = estimated marginal means (EMMs) and ▍ = simultaneous confidence intervals (CIs) of the linear mixed model (LMM). This LMM models, the IBS of the benchmark results via the predictors data set and pipeline configuration, and their interaction, along with the training/test CV fold as random effect.

References

    1. van Griethuysen JJM Fedorov A Parmar C, et al. . Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017;77:e104–e107. - PMC - PubMed
    1. Mayerhoefer ME Materka A Langs G, et al. . Introduction to radiomics. J Nucl Med. 2020;61:488–495. - PMC - PubMed
    1. Kocak B Bulut E Bayrak ON, et al. . Negative results in radiomics research (NEVER): a meta-research study of publication bias in leading radiology journals. Eur J Radiol. 2023;163:110830. - PubMed
    1. Bommert A Sun X Bischl B, et al. . Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal. 2020;143:106839.
    1. Bischl B Binder M Lang M, et al. . Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. WIREs Data Min Knowl Discov. 2023;13:e1484.