Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;26(4):e70010.
doi: 10.1002/acm2.70010. Epub 2025 Feb 13.

Evaluation and failure analysis of four commercial deep learning-based autosegmentation software for abdominal organs at risk

Affiliations

Evaluation and failure analysis of four commercial deep learning-based autosegmentation software for abdominal organs at risk

Mingdong Fan et al. J Appl Clin Med Phys. 2025 Apr.

Abstract

Purpose: Deep learning-based segmentation of organs-at-risk (OAR) is emerging to become mainstream in clinical practice because of the superior performance over atlas and model-based autocontouring methods. While several commercial deep learning-based autosegmentation solutions are now available, the implementation of these tools is still at such a primitive stage that acceptance criteria are underdeveloped due to a lack of knowledge about the systems' segmentation tendencies and failure modes. As the starting point of the iterative process of clinical implementation, this study focuses on the outlier analysis of four commercial autocontouring tools for the abdominal OARs.

Materials and methods: The autosegmentation software, developed by Limbus AI, MIM Contour ProtégéAI, Radformation AutoContour, and Siemens syngo.via, were used to segment 111 patient cases. Geometric segmentation accuracy was quantitatively compared with clinical contours using the dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95). The outliers from quantitative evaluations of each software were analyzed for the liver, stomach, and kidneys with the possible causes of outliers summarized into six categories: (1) difference in contouring style or guideline, (2) image acquisition and quality, (3) abnormal anatomy of the OAR, (4) abnormal anatomy of abutting organs/tissues, (5) external/internal devices, and (6) other causes.

Results: For the liver segmentation, the most prominent cause of discrepancies for Limbus, which occurred in four of its six outliers, was the existence of biliary stent or internal/external biliary drain as well as the resulting pneumobilia. Siemens included the abutting organs that shared CT numbers similar to those of the liver in 5/8 outliers. 12 of 13 Radformation's liver segmentation outliers included the heart and/or stomach while MIM not only included the stomach in the presence of barium in 5/11 outliers, but also produced fragmented contours in 5/11 other cases. Only Limbus and Radformation provided stomach segmentation, and imaging with barium contrast directly caused incomplete stomach delineation in 10/12 Limbus outliers and 21/25 Radformation outliers. As for the kidneys, Radformation and Siemens consistently followed the RTOG contouring guidelines, whereas the institutional contours excluded the renal pelvis in some cases, resulting in 19/25 Radformation outliers and 18/23 Siemens outliers. By contrast, Limbus contours appeared to follow different contouring guidelines that exclude the renal pelvis. Fragmented kidney contours were found in 10/15 Limbus outliers and 25/26 MIM outliers. The ones in MIM were directly linked to the use of IV contrast in imaging, but there was not enough evidence to identify the origin of Limbus's fragmented contours.

Conclusion: The causes of the segmentation outliers of the four commercial deep learning-based autocontouring solutions were summarized for each OAR. This work can help the vendors improve their autosegmentation software and also inform the users of potential modes of failure when using the tools.

Keywords: artificial intelligence; autosegmentation; organs at risk; outlier analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
DSC and HD95 evaluations of deep learning autosegmentations for the liver, stomach, and kidneys. The HD95 display limit for extreme outliers in liver was set at 85 mm.
FIGURE 2
FIGURE 2
Liver autosegmentation outliers: (a) Limbus had a rugged segmentation and included part of the heart, MIM had a fragmented segmentation while Radformation contour included both stomach and heart, (b) the liver had an abnormal shape, where it extended to the left rib, and the Limbus contour was short on the left. There was an internal/external biliary drain and the resulting gas bubbles were inside the liver. They were included in the clinical contour but excluded from the Limbus and Siemens contours, (c) the Limbus contour included the pneumobilia region created by a biliary stent inferiorly while the Siemens contour excluded one individual bubble, and the clinical contour excluded the entire pneumobilia region, (d) the liver abutted ascites on the right and gallbladder inferiorly, and all autosegmentations included the entire gallbladder and some buildup fluid superiorly, (e) the MIM contour included some GTV inferiorly, (f) the MIM contour was fragmented in a region with no apparent boundaries, (g) the MIM contour included the stomach with barium contrast, (h) Siemens contour included the abutting IVC, and (i) the clinical contour missed some liver near the boundary with duodenum inferiorly while the Siemens contour included the duodenum.
FIGURE 3
FIGURE 3
Stomach autosegmentation outliers: (a) Limbus and Radformation failed to segment the part of the stomach with barium contrast, (b) there was a lack of visible boundaries between the stomach and its surrounding tissues in the absence of barium contrast, and Limbus had fragmented segmentation, missed part of the stomach, and included part of large bowel, (c) the stomach had an abnormal location: it was located at the same level as the heart and entirely on the right side of the body. The Limbus contour missed the right and posterior part of the stomach while the Radformation contour had a very tiny volume inferior to this axial slice, (d) the stomach was filled with a mixture of food and barium, and it abutted an empty large bowel anteriorly. Both Limbus and Radformation included the large bowel superiorly and missed the entire inferior stomach where it flexed to the right. Limbus also produced fragmented segmentation, (e) the stomach abutted the CTV inferiorly. Both Limbus and Radformation under‐segmented the stomach and included part of the CTV, and (f) the Radformation contour extended to the duodenum stent and had incomplete delineation on the left while Limbus did not segment the stomach at the level of the stent.
FIGURE 4
FIGURE 4
Kidney autosegmentation outliers: (a) Limbus's delineation guideline does not include the renal pelvis while the clinical contour did for the patient, (b) Radformation and Siemens delineation guidelines include the renal pelvis while the clinical contour did not for the patient. The adrenal gland superior to the left kidney was also included by the autocontours, (c) fragmented Limbus and MIM segmentations in the presence of IV contrast, (d) the cysts abutting the left and right kidneys were not a part of the clinical contours while all autocontours included partial or the whole cyst. The images were not acquired with IV contrast, (e) the small cyst was included in the clinical contour, whereas it was excluded from the Limbus contour, and (f) the segmentation of the right kidney was negatively affected for all vendors by the enlarged liver that extended far inferiorly and abutted the right kidney with very similar HU values. The images were acquired in the pyelographic phase of IV contrast, which could not provide enhancement of renal parenchyma.

References

    1. Cardenas CE, Yang J, Anderson BM, Court LE, Brock KB. Advances in auto‐segmentation. Seminars in Radiation Oncology. Elsevier; 2019:185‐197. - PubMed
    1. Sartor H, Minarik D, Enqvist O, et al. Auto‐segmentations by convolutional neural network in cervical and anorectal cancer with clinical structure sets as the ground truth. Clin Transl Radiat Oncol. 2020;25:37‐45. - PMC - PubMed
    1. Lin D, Lapen K, Sherer MV, et al. A systematic review of contouring guidelines in radiation oncology: analysis of frequency, methodology, and delivery of consensus recommendations. Int J Radiat Oncol Biol Phys. 2020;107:827‐835. - PMC - PubMed
    1. Mir R, Kelly SM, Xiao Y, et al. Organ at risk delineation for radiation therapy clinical trials: Global Harmonization Group consensus guidelines. Radiother Oncol. 2020;150:30‐39. - PubMed
    1. Wright JL, Yom SS, Awan MJ, et al. Standardizing normal tissue contouring for radiation therapy treatment planning: an ASTRO consensus paper. Pract Radiat Oncol. 2019;9:65‐72. - PubMed

Publication types

MeSH terms