Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 13:2024.05.13.24307226.
doi: 10.1101/2024.05.13.24307226.

Artificial Intelligence Uncertainty Quantification in Radiotherapy Applications - A Scoping Review

Affiliations

Artificial Intelligence Uncertainty Quantification in Radiotherapy Applications - A Scoping Review

Kareem A Wahid et al. medRxiv. .

Update in

Abstract

Background/purpose: The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions.

Methods: We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics.

Results: We identified 56 articles published from 2015-2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50%), followed by image-synthesis (13%), and multiple applications simultaneously (11%). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32%). Imaging data was used in 91% of studies, while only 13% incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60%), with Monte Carlo dropout being the most commonly implemented UQ method (32%) followed by ensembling (16%). 55% of studies did not share code or datasets.

Conclusion: Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, there was a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: KAW serves as an Editorial Board Member for Physics and Imaging in Radiation Oncology. CDF has received travel, speaker honoraria and/or registration fee waiver unrelated to this project from: The American Association for Physicists in Medicine; the University of Alabama-Birmingham; The American Society for Clinical Oncology; The Royal Australian and New Zealand College of Radiologists; The American Society for Radiation Oncology; The Radiological Society of North America; and The European Society for Radiation Oncology.

Figures

Figure A1.
Figure A1.
Illustrative examples of aleatoric and epistemic uncertainty concepts. (A) Left: A computed tomography image of an oropharyngeal cancer patient, overlaid with a probability map of interobserver agreement, illustrates aleatoric uncertainty in segmentation. Example data derived from expert contours from the Contouring Collaborative in Radiation Oncology (doi: 10.1038/s41597-023-02062-w). Right: A hypothetical tumor contouring model trained using oropharyngeal cancer cases would yield high epistemic uncertainty when presented with a parotid tumor case as a byproduct of insufficient training data. The combination of aleatoric and epistemic uncertainties contributes to the total predictive uncertainty. (B) A scatterplot of hypothetical variables x and y demonstrates high aleatoric uncertainty in regions with noisy data points and high epistemic uncertainty in regions with sparse data points.
Figure A2.
Figure A2.
Study overview. This scoping review aims to comprehensively evaluate the literature on artificial intelligence models designed to quantify model uncertainty, specifically within the context of radiotherapy applications such as image acquisition, contouring, dose prediction, and outcome prediction, among others.
Figure A3.
Figure A3.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram illustrating systematic screening of identified studies. Ultimately, 56 studies out of the initially identified 8980 were included for the final analysis.
Figure 1.
Figure 1.
General study characteristics. (A) Stacked barplot showing total number of publications per country by publication type. (B) Heatmap of the number of studies by continent where green indicates a low number of publications and blue indicates a high number of publications; continents where no studies were extracted from are represented in white. (C) Stacked barplot showing code and data availability over time. Each item in the barplots corresponds to one study.
Figure 2.
Figure 2.
Radiotherapy characteristics. (A) Stacked barplot showing cancer disease site per each radiotherapy application domain. “Other” category for cancer type included cervical, liver, esophageal, pancreatic, cardiac, breast, pelvic. “Other” category for radiotherapy application included nodal classification, tumor growth modeling, and image correction. (B) Stacked barplot showing additional data per each imaging modality represented. “Other” category for additional data included registration transforms, respiratory trace, K-space, fiducial, clinical data, target+clinical data, dose+clinical data, and dose+clinical data+target+probability map. Each item in the barplots correspond to one study.
Figure 3.
Figure 3.
Artificial intelligence characteristics. (A) Scatter plot showing number of training, validation, and testing patients used in studies. Only studies that explicitly reported patient-level sample sizes are included. The three studies with the highest sample sizes in each category are annotated. (B) Bar plot showing types of testing strategies used in studies. Each item in the barplot corresponds to one study.
Figure 4.
Figure 4.
Uncertainty quantification characteristics. (A) Tree map of uncertainty quantification applications represented in the studies. (B) Tree map of uncertainty quantification methods represented in the studies. (C) Tree map of uncertainty quantification metrics represented in the studies. Each item in the tree maps correspond to a reported item (could be multiple per study).

References

    1. Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med 2022;28:31–8. - PubMed
    1. Wells L, Bednarz T. Explainable AI and Reinforcement Learning-A Systematic Review of Current Approaches and Trends. Front Artif Intell 2021;4:550030. - PMC - PubMed
    1. Shashikumar SP, Wardi G, Malhotra A, Nemati S. Artificial intelligence sepsis prediction algorithm learns to say “I don’t know.” NPJ Digit Med 2021;4:134. - PMC - PubMed
    1. Begoli E, Bhattacharya T, Kusnezov D. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 2019;1:20–3.
    1. Abdar M, Khosravi A, Islam SMS, Rajendra Acharya U, Vasilakos AV. The need for quantification of uncertainty in artificial intelligence for clinical data analysis: increasing the level of trust in the decision-making process. IEEE Systems, Man, and Cybernetics Magazine 2022;8:28–40.