Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;15(7):6301-6325.
doi: 10.21037/qims-2025-354. Epub 2025 Jun 30.

Multi-level channel-spatial attention and light-weight scale-fusion network (MCSLF-Net): multi-level channel-spatial attention and light-weight scale-fusion transformer for 3D brain tumor segmentation

Affiliations

Multi-level channel-spatial attention and light-weight scale-fusion network (MCSLF-Net): multi-level channel-spatial attention and light-weight scale-fusion transformer for 3D brain tumor segmentation

Mingzhe Zhou et al. Quant Imaging Med Surg. .

Abstract

Background: Gliomas, the most aggressive primary tumors in the central nervous system, are characterized by high morphological heterogeneity and diffusely infiltrating boundaries. Such complexity poses significant challenges for accurate segmentation in clinical practice. Although deep learning methods have shown promising results, they often struggle to achieve a satisfactory trade-off among precise boundary delineation, robust multi-scale feature representation, and computational efficiency, particularly when processing high-resolution three-dimensional (3D) magnetic resonance imaging (MRI) data. Therefore, the aim of this study is to develop a novel 3D segmentation framework that specifically addresses these challenges, thereby improving clinical utility in brain tumor analysis. To accomplish this, we propose a multi-level channel-spatial attention and light-weight scale-fusion network (MCSLF-Net), which integrates a multi-level channel-spatial attention mechanism (MCSAM) and a light-weight scale-fusion module. By strategically enhancing subtle boundary features while maintaining a compact network design, our approach seeks to achieve high accuracy in delineating complex glioma morphologies, reduce computational burden, and provide a more clinically feasible segmentation solution.

Methods: We propose MCSLF-Net, a network integrating two key components: (I) MCSAM: by strategically inserting a 3D channel-spatial attention module at critical semantic layers, the network progressively emphasizes subtle, infiltrative edges and small, easily overlooked contours. This avoids reliance on an additional edge detection branch while enabling fine-grained localization in ambiguous transitional regions. (II) Light-weight scale fusion unit (LSFU): leveraging depth-wise separable convolutions combined with multi-scale atrous (dilated) convolutions, LSFU enhances computational efficiency and adapts to varying feature requirements at different network depths. In doing so, it effectively captures small infiltrative lesions as well as extensive tumor areas. By coupling these two modules, MCSLF-Net balances global contextual information with local fine-grained features, simultaneously reducing the computational burden typically associated with 3D medical image segmentation.

Results: Extensive experiments on the BraTS 2019, BraTS 2020, and BraTS 2021 datasets validated the effectiveness of our approach. On BraTS 2021, MCSLF-Net achieved a mean Dice similarity coefficient (DSC) of 0.8974 and a mean 95th percentile Hausdorff distance (HD95) of 2.52 mm. Notably, it excels in segmenting intricate transitional areas, including the enhancing tumor (ET) region and the tumor core (TC), thereby demonstrating superior boundary delineation and multi-scale feature fusion capabilities relative to existing methods.

Conclusions: These findings underscore the clinical potential of deploying multi-level channel-spatial attention and light-weight multi-scale fusion strategies in high-precision 3D glioma segmentation. By striking an optimal balance among boundary accuracy, multi-scale feature capture, and computational efficiency, the proposed MCSLF-Net offers a practical framework for further advancements in automated brain tumor analysis and can be extended to a range of 3D medical image segmentation tasks.

Keywords: Brain tumor segmentation; light-weight scale fusion; multi-level attention mechanism; transformer-based network.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-354/coif). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Illustration of the axial views of brain MRI in FLAIR, T1, T1CE, and T2 modalities, along with the GT annotations. In the GT mask, the WT is the union of the orange, red, and blue regions; the TC is the union of the red and blue regions; and the ET core corresponds to the orange region. ET, enhancing tumor; FLAIR, fluid-attenuated inversion recovery; GT, ground-truth; MRI, magnetic resonance imaging; T1, T1-weighted; T1CE, contrast-enhanced T1-weighted; T2, T2-weighted; TC, tumor core; WT, whole tumor.
Figure 2
Figure 2
Overall architecture of MCSLF-Net. Dashed arrows indicate internal components in the network architecture. X denotes the input 3D volume and S denotes the final segmentation output. 3D, three-dimensional; CBAM, convolutional block attention module; DW Conv, depth-wise separable convolution; DW Dila Conv, depth-wise dilation convolution; LSFU, light-weight scale fusion unit; MCSLF-Net, multi-level channel-spatial attention and light-weight scale-fusion network; MLP, multi-layer perceptron; MSA, multi-head self-attention; TF, transformer.
Figure 3
Figure 3
The overall architecture of the LSFU module. It uses a cascaded multi-scale feature processing strategy that includes DW Conv, BN, and ReLU activation. The U-shaped structure within the module fuses features via skip connections (add), where L represents encoder-decoder depth (L=4, 5, 6, 7). BN, batch normalization; DW Conv, depth-wise separable convolution; LSFU, light-weight scale fusion unit; ReLU, rectified linear unit.
Figure 4
Figure 4
Visualization of segmentation outcomes for three representative cases from the BraTS 2020 dataset. Each row, from left to right, displays the original FLAIR, T1, T1CE, and T2 MRI modalities, followed by the segmentation results from 3D U-Net, VNet, SwinUNETR, TransBTS, and the method proposed in this paper (Ours), with the final column showing the GT labels. The orange region denotes the ET, the red region indicates the TC, and the blue region represents the WT. (A) A tumor exhibiting multiple subregions in the right hemisphere; (B) a large-volume lesion in the left hemisphere; (C) a small yet anatomically complex tumor located in the anterior portion of the right hemisphere. All cases are shown across the axial, sagittal, and coronal planes. Green bounding boxes mark representative boundary areas or small lesion details that are especially challenging to segment, illustrating where each model may misclassify or omit subtle structures. 3D, three-dimensional; ET, enhancing tumor; FLAIR, fluid-attenuated inversion recovery; GT, ground-truth; MRI, magnetic resonance imaging; T1, T1-weighted; T1CE, contrast-enhanced T1-weighted; T2, T2-weighted; TC, tumor core; WT, whole tumor.
Figure 5
Figure 5
Visualization of the ablation experiment results on the BraTS 2020 dataset. All slices are axial T1-weighted views. From left to right, the figure shows the original image (Original), the baseline model’s segmentation output (Base), segmentation after integrating the LSFU module (LSFU), segmentation with the MCSAM mechanism (MCSAM), the final model’s segmentation (Ours), and the GT. In the displayed color scheme, orange denotes the ET, red represents the TC, and blue depicts the WT. Green bounding boxes mark representative boundary areas or small lesion details that are especially challenging to segment, illustrating where each model may misclassify or omit subtle structures. Four distinct slices are presented, illustrating how each module incrementally improves segmentation performance. ET, enhancing tumor; GT, ground-truth; LSFU, light-weight scale fusion unit; MCSAM, multi-level channel-spatial attention mechanism; TC, tumor core; WT, whole tumor.

Similar articles

References

    1. Louis DN, Perry A, Reifenberger G, von Deimling A, Figarella-Branger D, Cavenee WK, Ohgaki H, Wiestler OD, Kleihues P, Ellison DW. The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary. Acta Neuropathol 2016;131:803-20. 10.1007/s00401-016-1545-1 - DOI - PubMed
    1. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, Freymann JB, Farahani K, Davatzikos C. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data 2017;4:170117. 10.1038/sdata.2017.117 - DOI - PMC - PubMed
    1. Bakas S, Reyes M, Jakab A, Bauer S, Rempfler M, Crimi A, Shinohara RT, Berger C, Ha SM, Rozycki M. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv:1811.02629 [Preprint]. 2018. Available online: https://arxiv.org/abs/1811.02629
    1. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging 2015;34:1993-2024. 10.1109/TMI.2014.2377694 - DOI - PMC - PubMed
    1. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Cham: Springer; 2015:234-41.

LinkOut - more resources