A full-scale attention-augmented CNN-transformer model for segmentation of oropharyngeal mucosa organs-at-risk in radiotherapy
- PMID: 40932560
- DOI: 10.1007/s13246-025-01614-1
A full-scale attention-augmented CNN-transformer model for segmentation of oropharyngeal mucosa organs-at-risk in radiotherapy
Abstract
Radiation-induced oropharyngeal mucositis (ROM) is a common and severe side effect of radiotherapy in nasopharyngeal cancer patients, leading to significant clinical complications such as malnutrition, infections, and treatment interruptions. Accurate delineation of the oropharyngeal mucosa (OPM) as an organ-at-risk (OAR) is crucial to minimizing radiation exposure and preventing ROM. This study aims to develop and validate an advanced automatic segmentation model, attention-augmented Swin U-Net transformer (AA-Swin UNETR), for accurate delineation of OPM to improve radiotherapy planning and reduce the incidence of ROM. We proposed a hybrid CNN-transformer model, AA-Swin UNETR, based on the Swin UNETR framework, which integrates hierarchical feature extraction with full-scale attention mechanisms. The model includes a Swin Transformer-based encoder and a CNN-based decoder with residual blocks, connected via a full-scale feature connection scheme. The full-scale attention mechanism enables the model to capture long-range dependencies and multi-level features effectively, enhancing the segmentation accuracy. The model was trained on a dataset of 202 CT scans from Nanfang Hospital, using expert manual delineations as the gold standard. We evaluated the performance of AA-Swin UNETR against state-of-the-art (SOTA) segmentation models, including Swin UNETR, nnUNet, and 3D UX-Net, using geometric and dosimetric evaluation parameters. The geometric metrics include Dice similarity coefficient (DSC), surface DSC (sDSC), volume similarity (VS), Hausdorff distance (HD), precision, and recall. The dosimetric metrics include changes of D0.1 cc and Dmean between results derived from manually delineated OPM and auto-segmentation models. The AA-Swin UNETR model achieved the highest mean DSC of 87.72 ± 1.98%, significantly outperforming Swin UNETR (83.53 ± 2.59%), nnUNet (85.48%± 2.68), and 3D UX-Net (80.04 ± 3.76%). The model also showed superior mean sDSC (98.44 ± 1.08%), mean VS (97.86 ± 1.43%), mean precision (87.60 ± 3.06%) and mean recall (89.22 ± 2.70%), with a competitive mean HD of 9.03 ± 2.79 mm. For dosimetric evaluation, the proposed model generates smallest mean [Formula: see text] (0.46 ± 4.92 cGy) and mean [Formula: see text] (6.26 ± 24.90 cGY) relative to manual delineation compared with other auto-segmentation results (mean [Formula: see text] of Swin UNETR = -0.56 ± 7.28 cGy, nnUNet = 0.99 ± 4.73 cGy, 3D UX-Net = -0.65 ± 8.05 cGy; mean [Formula: see text] of Swin UNETR = 7.46 ± 43.37, nnUNet = 21.76 ± 37.86 and 3D UX-Net = 44.61 ± 62.33). In this paper, we proposed a transformer and CNN hybrid deep-learning based model AA-Swin UNETR for automatic segmentation of OPM as an OAR structure in radiotherapy planning. Evaluations with geometric and dosimetric parameters demonstrated AA-Swin UNETR can generate delineations close to a manual reference, both in terms of geometry and dose-volume metrics. The proposed model out-performed existing SOTA models in both evaluation metrics and demonstrated is capability of accurately segmenting complex anatomical structures of the OPM, providing a reliable tool for enhancing radiotherapy planning.
Keywords: Deep learning; Dosimetric; Organ-at-risk; Oropharyngeal mucosa; Radiation-induced oropharyngeal mucositis; Radiotherapy; Segmentation.
© 2025. Australasian College of Physical Scientists and Engineers in Medicine.
Conflict of interest statement
Declarations. Ethics approval: The study was approved by Institutional Review Board at the Southern Medical University (NFEC-2018-013). Conflict of interest: Authors LH, SL, JL and ZY are employed by the company GuangZhou Perception Vision Medical Technologies Co Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
-
- Zhang Y et al (2019) Gemcitabine and cisplatin induction chemotherapy in nasopharyngeal carcinoma. New Eng J Med 381(12):1124–1135. https://doi.org/10.1056/NEJMoa1905287 - DOI - PubMed
-
- Lv X et al (2021) Induction chemotherapy with lobaplatin and fluorouracil versus cisplatin and fluorouracil followed by chemoradiotherapy in patients with stage Iii-Ivb nasopharyngeal carcinoma: an open-label, non-inferiority, randomised, controlled, phase 3 trial. Lancet Oncol 22(5):716–726. https://doi.org/10.1016/S1470-2045(21)00075-9 - DOI - PubMed
-
- Zheng Z et al (2021) The effects of early nutritional intervention on oral mucositis and nutritional status of patients with head and neck cancer treated with radiotherapy. Front Oncol 10:595632. https://doi.org/10.3389/fonc.2020.595632 - DOI - PubMed - PMC
-
- Russo G, Haddad R, Posner M, Machtay MJTO (2008) Radiation treatment breaks and ulcerative mucositis in head and neck cancer. Oncologist 13(8):886–898. https://doi.org/10.1634/theoncologist.2008-0024 - DOI - PubMed
-
- Wardill H et al (2020) Prediction of mucositis risk secondary to cancer therapy: a systematic review of current evidence and call to action. Support Care Cancer 28(11):5059–5073. https://doi.org/10.1007/s00520-020-05579-7 - DOI - PubMed