Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar-Apr;14(2):e150-e158.
doi: 10.1016/j.prro.2023.10.011. Epub 2023 Nov 5.

Framework for Radiation Oncology Department-wide Evaluation and Implementation of Commercial Artificial Intelligence Autocontouring

Affiliations

Framework for Radiation Oncology Department-wide Evaluation and Implementation of Commercial Artificial Intelligence Autocontouring

Dominic Maes et al. Pract Radiat Oncol. 2024 Mar-Apr.

Abstract

Purpose: Artificial intelligence (AI)-based autocontouring in radiation oncology has potential benefits such as standardization and time savings. However, commercial AI solutions require careful evaluation before clinical integration. We developed a multidimensional evaluation method to test pretrained AI-based automated contouring solutions across a network of clinics.

Methods and materials: Curated data included 121 patient planning computed tomography (CT) scans with a total of 859 clinically approved contours used for treatment from 4 clinics. Regions of interest (ROIs) were generated with 3 commercial AI-based automated contouring software solutions (AI1, AI2, AI3) spanning the following disease sites: brain, head and neck (H&N), thorax, abdomen, and pelvis. Quantitative agreement between AI-generated and clinical contours was measured by Dice similarity coefficient (DSC) and Hausdorff distance (HD). Qualitative assessment was performed by multiple experts scoring blinded AI-contours using a Likert scale. Workflow and usability surveying was also conducted.

Results: AI1, AI2, and AI3 contours had high quantitative agreement in 27.8%, 32.8%, and 34.1% of cases (DSC >0.9), performing well in pelvis (median DSC = 0.86/0.88/0.91) and thorax (median DSC = 0.91/0.89/0.91). All 3 solutions had low quantitative agreement in 7.4%, 8.8%, and 6.1% of cases (DSC <0.5), performing worse in brain (median DSC = 0.65/0.78/0.75) and H&N (median DSC = 0.76/0.80/0.81). Qualitatively, AI1 and AI2 contours were acceptable (rated 1-2) with at most minor edits in 70.7% and 74.6% of ROIs (2906 ratings), higher for abdomen (AI1: 79.2%) and thorax (AI2: 90.2%), and lower for H&N (29.0/35.6%). An end-user survey showed strong user preference for full automation and mixed preferences for accuracy versus total number of structures generated.

Conclusions: Our evaluation method provided a comprehensive analysis of both quantitative and qualitative measures of commercially available pretrained AI autocontouring algorithms. The evaluation framework served as a roadmap for clinical integration that aligned with user workflow preference.

PubMed Disclaimer

Conflict of interest statement

Disclosures None.

LinkOut - more resources