Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Jun;26(6):e70089.
doi: 10.1002/acm2.70089. Epub 2025 Apr 5.

Open-source deep-learning models for segmentation of normal structures for prostatic and gynecological high-dose-rate brachytherapy: Comparison of architectures

Affiliations
Comparative Study

Open-source deep-learning models for segmentation of normal structures for prostatic and gynecological high-dose-rate brachytherapy: Comparison of architectures

Andrew J Krupien et al. J Appl Clin Med Phys. 2025 Jun.

Abstract

Background: The use of deep learning-based auto-contouring algorithms in various treatment planning services is increasingly common. There is a notable deficit of commercially or publicly available models trained on large or diverse datasets containing high-dose-rate (HDR) brachytherapy treatment scans, leading to poor performance on images that include HDR implants.

Purpose: To implement and evaluate automatic organs-at-risk (OARs) segmentation models for use in prostatic-and-gynecological computed tomography (CT)-guided high-dose-rate brachytherapy treatment planning.

Methods and materials: 1316 computed tomography (CT) scans and corresponding segmentation files from 1105 prostatic-or-gynecological HDR patients treated at our institution from 2017 to 2024 were used for model training. Data sources comprised six CT scanners including a mobile CT unit with previously reported susceptibility to image streaking artifacts. Two UNet-derived model architectures, UNet++ and nnU-Net, were investigated for bladder and rectum model training. The models were tested on 100 CT scans and clinically-used segmentation files from 62 prostatic-or-gynecological HDR brachytherapy patients, disjoint from the training set, collected in 2024. Performance was evaluated using the Dice-Similarity-Coefficient (DSC) between model predicted contours and clinically-used contours on slices in common with the Clinical-Target-Volume (CTV). Additionally, a blinded evaluation of ten random test cases was conducted by three experienced planners.

Results: Median (interquartile range) 3D DSC on CTV-containing slices were 0.95 (0.04) and 0.87 (0.09) for the UNet++ bladder and rectum models, respectively, and 0.96 (0.03) and 0.88 (0.10) for the nnU-Net. The rank-sum test did not reveal statistically significant differences in these DSC (p = 0.15 and 0.27, respectively). The blinded evaluation scored trained models higher than clinically-used contours.

Conclusion: Both UNet-derived architectures perform similarly on the bladder and rectum and are adequately accurate to reduce contouring time in a review-and-edit context during HDR brachytherapy planning. The UNet++ models were chosen for implementation at our institution due to lower computing hardware requirements and are in routine clinical use.

Keywords: Auto‐contouring; brachytherapy; dice‐similarity‐coefficient; segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
UNet++ diagram. Black curves indicate Skip Connections, orange arrows Max Pooling operations, green arrows Transposed Convolutions, and the blue boxes two convolutions followed by a dropout layer. The dropout layers are represented with the two convolutions as the blue boxes.
FIGURE 2
FIGURE 2
DSC evaluation of the autocontours. Autocontours are compared to clinically‐used contours. Plot A represents the DSC evaluated across the 3‐Dimensional volume, Plot B represents the DSC evaluated across the 3‐Dimensional volume restricted to CTV slices only, and Plot C represents the DSC evaluated on a 2D slice by slice basis, restricted to CTV slices only. Box‐and‐whisker plots show median, interquartile range, and the whiskers extend from the box to the farthest data point lying within 1.5x the inter‐quartile range (IQR) from the box.
FIGURE 3
FIGURE 3
HD 95th Percentile evaluation of the autocontours. Autocontours are compared to clinically‐used contours. Box‐and‐whisker plots show median, interquartile range, and the whiskers extend from the box to the farthest data point lying within 1.5x the inter‐quartile range (IQR) from the box.
FIGURE 4
FIGURE 4
Results from the blinded evaluation of the auto‐ and clinically‐used contours. Box‐and‐whisker plots show median, interquartile range, and the whiskers extend from the box to the farthest data point lying within 1.5x the inter‐quartile range (IQR) from the box. On this scale, a rating of ten required no modifications, a rating of seven required minor modifications, a rating of four required major modifications, and a rating of one was unusable.
FIGURE 5
FIGURE 5
Tri‐Plane view of minimum 3D rectum dice for UNet++. 0.32 for UNet++ and 0.43 for 2D nnU‐Net. Clinically‐used contours are in Yellow. UNet++ autocontours are in Blue, and 2D nnU‐Net autocontours are in Orange. The clinically‐used CTV is in Red. This figure shows the lowest global 3D rectum DSC for the UNet++ model. This figure further illustrates the variability in model predictions with respect to the vertical position of the rectum.
FIGURE 6
FIGURE 6
Tri‐Plane view of minimum 3D rectum dice for 2D nnU‐Net. 0.23 for 2D nnU‐Net and 0.36 for UNet++. Clinically‐used contours are in Yellow. UNet++ autocontours are in Blue, and 2D nnU‐Net autocontours are in Orange. The clinically‐used CTV is in Red. Model performance suffered in the presence of major photon starvation artifacts. The height of the rectum was ambiguous, and for the 2D nnU‐Net model the rectum was disconnected.
FIGURE 7
FIGURE 7
Tri‐Plane view of maximum 3D bladder dice of 0.98 for both networks. Clinically‐used contours are in Yellow. UNet++ autocontours are in Blue, and 2D nnU‐Net autocontours are in Orange. The clinically‐used CTV is in Red. The models performed well despite the presence of minor streaking artifacts frequently present in Airo scans.
FIGURE 8
FIGURE 8
Tri‐Plane view of minimum 3D bladder dice of 0.45 for UNet++ and 0.41 for 2D nnU‐Net. Clinically‐used contours are in Yellow. UNet++autocontours are in Blue, and 2D nnU‐Net autocontours are in Orange. The clinically‐used CTV is in Red. The models did not perform as well in the presence of large insertions leading to anatomy deformation.

Similar articles

References

    1. Baroudi H, Brock KK, Cao W, et al. Automated contouring and planning in radiation therapy: what is ‘clinically acceptable’? Diagnostics (Basel). 2023;13(4):667. doi: 10.3390/diagnostics13040667. Published 2023 Feb 10. - DOI - PMC - PubMed
    1. Robert C, Munoz A, Moreau D, et al. Clinical implementation of deep‐learning based auto‐contouring tools‐experience of three French radiotherapy centers. Cancer Radiother. 2021;25(6‐7):607‐616. doi: 10.1016/j.canrad.2021.06.023 - DOI - PubMed
    1. Hoque SMH, Pirrone G, Matrone F, et al. Clinical use of a commercial artificial intelligence‐based software for autocontouring in radiation therapy: geometric performance and dosimetric impact. Cancers (Basel). 2023;15(24):5735. doi: 10.3390/cancers15245735. Published 2023 Dec 7. - DOI - PMC - PubMed
    1. Abdulkadir Y, Luximon D, Morris E, et al. Human factors in the clinical implementation of deep learning‐based automated contouring of pelvic organs at risk for MRI‐guided radiotherapy. Med Phys. 2023;50(10):5969‐5977. doi: 10.1002/mp.16676 - DOI - PubMed
    1. Lee BM, Kim JS, Chang Y, et al. Experience of implementing deep learning‐based automatic contouring in breast radiation therapy planning: insights from over 2000 cases. Int J Radiat Oncol Biol Phys. 2024;119(5):1579‐1589. doi: 10.1016/j.ijrobp.2024.02.041 - DOI - PubMed

Publication types

MeSH terms