Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;23(7):e13630.
doi: 10.1002/acm2.13630. Epub 2022 May 9.

Attention-aware 3D U-Net convolutional neural network for knowledge-based planning 3D dose distribution prediction of head-and-neck cancer

Affiliations

Attention-aware 3D U-Net convolutional neural network for knowledge-based planning 3D dose distribution prediction of head-and-neck cancer

Alexander F I Osman et al. J Appl Clin Med Phys. 2022 Jul.

Abstract

Purpose: Deep learning-based knowledge-based planning (KBP) methods have been introduced for radiotherapy dose distribution prediction to reduce the planning time and maintain consistent high-quality plans. This paper presents a novel KBP model using an attention-gating mechanism and a three-dimensional (3D) U-Net for intensity-modulated radiation therapy (IMRT) 3D dose distribution prediction in head-and-neck cancer.

Methods: A total of 340 head-and-neck cancer plans, representing the OpenKBP-2020 AAPM Grand Challenge data set, were used in this study. All patients were treated with the IMRT technique and a dose prescription of 70 Gy. The data set was randomly divided into 64%/16%/20% as training/validation/testing cohorts. An attention-gated 3D U-Net architecture model was developed to predict full 3D dose distribution. The developed model was trained using the mean-squared error loss function, Adam optimization algorithm, a learning rate of 0.001, 120 epochs, and batch size of 4. In addition, a baseline U-Net model was also similarly trained for comparison. The model performance was evaluated on the testing data set by comparing the generated dose distributions against the ground-truth dose distributions using dose statistics and clinical dosimetric indices. Its performance was also compared to the baseline model and the reported results of other deep learning-based dose prediction models.

Results: The proposed attention-gated 3D U-Net model showed high capability in accurately predicting 3D dose distributions that closely replicated the ground-truth dose distributions of 68 plans in the test set. The average value of the mean absolute dose error was 2.972 ± 1.220 Gy (vs. 2.920 ± 1.476 Gy for a baseline U-Net) in the brainstem, 4.243 ± 1.791 Gy (vs. 4.530 ± 2.295 Gy for a baseline U-Net) in the left parotid, 4.622 ± 1.975 Gy (vs. 4.223 ± 1.816 Gy for a baseline U-Net) in the right parotid, 3.346 ± 1.198 Gy (vs. 2.958 ± 0.888 Gy for a baseline U-Net) in the spinal cord, 6.582 ± 3.748 Gy (vs. 5.114 ± 2.098 Gy for a baseline U-Net) in the esophagus, 4.756 ± 1.560 Gy (vs. 4.992 ± 2.030 Gy for a baseline U-Net) in the mandible, 4.501 ± 1.784 Gy (vs. 4.925 ± 2.347 Gy for a baseline U-Net) in the larynx, 2.494 ± 0.953 Gy (vs. 2.648 ± 1.247 Gy for a baseline U-Net) in the PTV_70, and 2.432 ± 2.272 Gy (vs. 2.811 ± 2.896 Gy for a baseline U-Net) in the body contour. The average difference in predicting the D99 value for the targets (PTV_70, PTV_63, and PTV_56) was 2.50 ± 1.77 Gy. For the organs at risk, the average difference in predicting the D m a x ${D_{max}}$ (brainstem, spinal cord, and mandible) and D m e a n ${D_{mean}}$ (left parotid, right parotid, esophagus, and larynx) values was 1.43 ± 1.01 and 2.44 ± 1.73 Gy, respectively. The average value of the homogeneity index was 7.99 ± 1.45 for the predicted plans versus 5.74 ± 2.95 for the ground-truth plans, whereas the average value of the conformity index was 0.63 ± 0.17 for the predicted plans versus 0.89 ± 0.19 for the ground-truth plans. The proposed model needs less than 5 s to predict a full 3D dose distribution of 64 × 64 × 64 voxels for a new patient that is sufficient for real-time applications.

Conclusions: The attention-gated 3D U-Net model demonstrated a capability in predicting accurate 3D dose distributions for head-and-neck IMRT plans with consistent quality. The prediction performance of the proposed model was overall superior to a baseline standard U-Net model, and it was also competitive to the performance of the best state-of-the-art dose prediction method reported in the literature. The proposed model could be used to obtain dose distributions for decision-making before planning, quality assurance of planning, and guiding-automated planning for improved plan consistency, quality, and planning efficiency.

Keywords: 3D dose prediction; attention-gated U-Net; convolutional neural networks; deep learning; head-and-neck cancer; intensity-modulated radiation therapy; knowledge-based planning; radiation therapy; radiotherapy treatment planning.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest to disclose.

Figures

FIGURE 1
FIGURE 1
The attention‐gated 3D U‐Net architecture for KBP radiotherapy dose prediction. The patient anatomical information (CT and contour structures) are used as inputs to the model network to predict a full 3D dose distribution. Blue boxes correspond to a set of the feature map. The number of extracted feature maps is denoted on the top/bottom of the cubes. The size of the feature maps is provided at the left/right side of the box. White boxes represent copied feature maps. The arrows represent different operations. The attention gating mechanism is also shown in the figure for the propagation signal z 1, gating signal z 2, and fnal gated output signal zg for the network. CT, computed tomography; KBP, knowledge‐based planning
FIGURE 2
FIGURE 2
The learning curve of training loss versus validation loss over the number of epochs for the attention‐gated U‐Net model. The train learning curve gives an idea of how well the model is learning, whereas the validation learning curve highlights the model's generalizability
FIGURE 3
FIGURE 3
The CT image, KBP‐predicted dose distribution, ground‐truth dose distribution, and voxel‐wise dose difference map (predicted—ground‐truth) in the axial, sagittal, and coronal planes of a sample head‐and‐neck patient plan in the test set. CT, computed tomography; KBP, knowledge‐based planning
FIGURE 4
FIGURE 4
The predicted dose distributions were presented side‐by‐side with corresponding ground‐truth dose distributions in addition to difference maps for eight patients in the test set
FIGURE 5
FIGURE 5
Boxplot of the MAE between the predicted and ground‐truth dose distributions for the targets and OARs contoured structures of all patients in the test set. The upper and lower boundaries of each box represent the 75th and 25th percentiles, respectively. The red line in the box depicts the median. Whiskers extend to 1.5 times the interquartile range and the most extreme outlier. MAE, mean absolute error; OAR, organs at risk
FIGURE 6
FIGURE 6
The DVHs of the predicted plans (dashed line) overlaid on the DVHs of the ground‐truth plans (solid line) for two sample patients (a) and (b) in the test set. The DVHs of the target volume structures (PTV_70, PTV_63, and PTV_56) and OAR structures (brainstem, left parotid, right parotid, spinal cord, mandible, esophagus, and larynx) are plotted in different colors as shown in the legend. DVH, dose–volume histogram
FIGURE 7
FIGURE 7
Visualizing example learned features at multiscale resolution levels of the attention‐gated U‐Net model. Each column represents an example set of 16 extracted features. The feature set in the most left column is extracted after the convolutional operations of the first hierarchy level in the encoder, representing low‐level feature maps that encode general patterns in the images. The column at the middle shows the extracted features after the convolutional operations of the last hierarchy level in the encoder (latent representation space), representing high‐level feature maps that encode task‐specific patterns in the images. The last column illustrates learned and reconstructed targeted images in the decoder

Similar articles

Cited by

References

    1. Webb S. Optimization by simulated annealing of three‐dimensional conformal treatment planning for radiation fields defined by a multileaf collimator. Phys Med Biol. 1991;36:1201. - PubMed
    1. Otto K. Volumetric modulated arc therapy: iMRT in a single gantry arc. Med Phys. 2008;35:310‐317. - PubMed
    1. Quan EM, Li X, Li Y, et al. A comprehensive comparison of IMRT and VMAT plan quality for prostate cancer treatment. Int J Radiat Oncol Biol Phys. 2012;83:1169‐1178. - PMC - PubMed
    1. Moore KL. Automated radiotherapy treatment planning. Semin Radiat Oncol. 2019;29(3):209‐218. - PubMed
    1. Bentzen SM, Constine LS, Deasy JO, et al. Quantitative analyses of normal tissue effects in the clinic (QUANTEC): an introduction to the scientific issues. Int J Radiat Oncol Biol Phys. 2010;76:S3‐S9. - PMC - PubMed

MeSH terms