Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 13;10(2):101675.
doi: 10.1016/j.adro.2024.101675. eCollection 2025 Feb.

Performance Comparison of 10 State-of-the-Art Machine Learning Algorithms for Outcome Prediction Modeling of Radiation-Induced Toxicity

Affiliations

Performance Comparison of 10 State-of-the-Art Machine Learning Algorithms for Outcome Prediction Modeling of Radiation-Induced Toxicity

Ramon M Salazar et al. Adv Radiat Oncol. .

Abstract

Purpose: To evaluate the efficacy of prominent machine learning algorithms in predicting normal tissue complication probability using clinical data obtained from 2 distinct disease sites and to create a software tool that facilitates the automatic determination of the optimal algorithm to model any given labeled data set.

Methods and materials: We obtained 3 sets of radiation toxicity data (478 patients) from our clinic: gastrointestinal toxicity, radiation pneumonitis, and radiation esophagitis. These data comprised clinicopathological and dosimetric information for patients diagnosed with non-small cell lung cancer and anal squamous cell carcinoma. Each data set was modeled using 11 commonly employed machine learning algorithms (elastic net, least absolute shrinkage and selection operator [LASSO], random forest, random forest regression, support vector machine, extreme gradient boosting, light gradient boosting machine, k-nearest neighbors, neural network, Bayesian-LASSO, and Bayesian neural network) by randomly dividing the data set into a training and test set. The training set was used to create and tune the model, and the test set served to assess it by calculating performance metrics. This process was repeated 100 times by each algorithm for each data set. Figures were generated to visually compare the performance of the algorithms. A graphical user interface was developed to automate this whole process.

Results: LASSO achieved the highest area under the precision-recall curve (0.807 ± 0.067) for radiation esophagitis, random forest for gastrointestinal toxicity (0.726 ± 0.096), and the neural network for radiation pneumonitis (0.878 ± 0.060). The area under the curve was 0.754 ± 0.069, 0.889 ± 0.043, and 0.905 ± 0.045, respectively. The graphical user interface was used to compare all algorithms for each data set automatically. When averaging the area under the precision-recall curve across all toxicities, Bayesian-LASSO was the best model.

Conclusions: Our results show that there is no best algorithm for all data sets. Therefore, it is important to compare multiple algorithms when training an outcome prediction model on a new data set. The graphical user interface created for this study automatically compares the performance of these 11 algorithms for any data set.

PubMed Disclaimer

Conflict of interest statement

Ramon M. Salazar, Alexandra O. Leone, Saurabh S. Nair, and Joshua S. Niedzielski report support through a grant from Varian Medical Systems. Joshua S. Niedzielski also reports a research grant from the Fund for Innovations in Cancer Informatics. Brian De reports grant funding from RSNA (RR2111) and honoraria from Sermo, Inc Prajnan Das reports honoraria from ASTRO, ASCO, Beyer, Imedex, Physicians Education Resource, and Conveners. Laurence E. Court reports grants from Varian Medical Systems, NCI, CPRIT, Wellcome Trust, and the Fund for Cancer Informatics.

Figures

Figure 1
Figure 1
Diagram of the model building and assessment process. Preprocessing, depending on the model being trained, may involve dummy coding, deleting zero variance features, or rescaling. The randomization and splitting of the initial data set correspond to a Monte Carlo cross-validation approach for the outer loop. The inner loop undergoes repeated k-fold cross-validation or out-of-bag error minimization for model construction. Abbreviations: AUC = area under the receiver operating characteristic curve; AUPRC = area under the precision-recall curve.
Figure 2
Figure 2
A heat map of the ranks for each algorithm across 300 separate iterations. Obtaining a rank of 1 means the algorithm had the highest area under the receiver operating characteristic curve (AUC, top) or area under the precision-recall curve (AUPRC, bottom) for a particular iteration. The line on each box marks the median value of the ranks. The yellow diamond locates the mean of the ranks. Abbreviations: BayesNN = Bayes neural network; KNNeighbors = k-nearest neighbors; LASSO = least absolute shrinkage and selection operator; LightGBM = light gradient boosting machine; NeurNet = neural network; RF = random forest; SVM = support vector machine; XGBTr = extreme gradient boosting.
Figure 3
Figure 3
A plotted table showing pairwise comparisons among the algorithms. The numbers inside the cells represent the frequency in percentage (out of 300 comparisons) that the models on the vertical axis yielded a superior area under the receiver operating characteristic curve (AUC, left) or area under the precision-recall curve (AUPRC, right) than the models on the horizontal axis. The fill color corresponds to the P value for the separation of the AUC (left) or AUPRC (right) distributions of the pairs. The gray cells are redundant. Abbreviations: BayesNN = Bayes neural network; KNNeighbors = k-nearest neighbors; LASSO = least absolute shrinkage and selection operator; LightGBM = light gradient boosting machine; NeurNet = neural network; RF = random forest; SVM = support vector machine; XGBTr = extreme gradient boosting.
Figure 4
Figure 4
A plotted table showing the average area under the receiver operating characteristic curve (AUC, top left), area under the precision-recall curve (AUPRC, top right), calibration slope (bottom left), and calibration intercept (bottom right) of the models for each data set and the SD of each distribution (corresponding to the 100 iterations per data set). The optimal value for the AUC, AUPRC, and calibration score is 1. The optimal value for the calibration intercept is 0. The yellow diamonds highlight the models that were superior for each individual cohort. These results provide a comprehensive overview of the model calibration in relation to their predictive performance, emphasizing their suitability for clinical application. Abbreviations: BayesNN = Bayes neural network; GIT = gastrointestinal toxicity; KNNeighbors = k-nearest neighbors; LASSO = least absolute shrinkage and selection operator; LightGBM = light gradient boosting machine; NeurNet = neural network; RE = radiation esophagitis; RF = random forest; RP = radiation pneumonitis; SVM = support vector machine; XGBTr = extreme gradient boosting.
Figure 5
Figure 5
The 5 best features for the best 3 models for each toxicity. The top, middle, and bottom rows contain information on gastrointestinal toxicity (GIT), radiation pneumonitis (RP), and radiation esophagitis (RE), respectively. Abbreviations: LASSO = least absolute shrinkage and selection operator; Neural Net = neural network.
Figure 6
Figure 6
Feature value plotted against phi (the approximate change in the predicted probability of toxicity because of the given feature value) for the most important features of the 3 toxicities. Black dots represent patients who did not develop toxicity, while red dots represent those who did. Note that “Anal V30” refers to the relative volume of the bowel bag receiving at least 30 Gy.

Similar articles

Cited by

References

    1. Rancati T, Fiorino C, Sanguineti G, Valdagni R, Orlandi E. Editorial: Modeling for prediction of radiation-induced toxicity to improve therapeutic ratio in the modern radiation therapy era. Front Oncol. 2021;11 - PMC - PubMed
    1. Wang Z, Li VR, Chu FI, et al. Predicting overall survival for patients with malignant mesothelioma following radiotherapy via interpretable machine learning. Cancers (Basel) 2023;15:3916. - PMC - PubMed
    1. Chamseddine I, Kim Y, De B, et al. Predictive modeling of survival and toxicity in patients with hepatocellular carcinoma after radiotherapy. JCO Clin Cancer Inform. 2022;6 - PMC - PubMed
    1. Núñez-Benjumea FJ, González-García S, Moreno-Conde A, Riquelme-Santos JC, López-Guerra JL. Benchmarking machine learning approaches to predict radiation-induced toxicities in lung cancer patients. Clin Transl Radiat Oncol. 2023;41 - PMC - PubMed
    1. Yakar M, Etiz D, Metintas M, Ak G, Celik O. Prediction of radiation pneumonitis with machine learning in stage iii lung cancer: A pilot study. Technol Cancer Res Treat. 2021;20 - PMC - PubMed

LinkOut - more resources