Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 1;14(7):4579-4604.
doi: 10.21037/qims-24-9. Epub 2024 Jun 27.

CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation

Affiliations

CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation

Kangkang Sun et al. Quant Imaging Med Surg. .

Abstract

Background: The information between multimodal magnetic resonance imaging (MRI) is complementary. Combining multiple modalities for brain tumor image segmentation can improve segmentation accuracy, which has great significance for disease diagnosis and treatment. However, different degrees of missing modality data often occur in clinical practice, which may lead to serious performance degradation or even failure of brain tumor segmentation methods relying on full-modality sequences to complete the segmentation task. To solve the above problems, this study aimed to design a new deep learning network for incomplete multimodal brain tumor segmentation.

Methods: We propose a novel cross-modal attention fusion-based deep neural network (CMAF-Net) for incomplete multimodal brain tumor segmentation, which is based on a three-dimensional (3D) U-Net architecture with encoding and decoding structure, a 3D Swin block, and a cross-modal attention fusion (CMAF) block. A convolutional encoder is initially used to extract the specific features from different modalities, and an effective 3D Swin block is constructed to model the long-range dependencies to obtain richer information for brain tumor segmentation. Then, a cross-attention based CMAF module is proposed that can deal with different missing modality situations by fusing features between different modalities to learn the shared representations of the tumor regions. Finally, the fused latent representation is decoded to obtain the final segmentation result. Additionally, channel attention module (CAM) and spatial attention module (SAM) are incorporated into the network to further improve the robustness of the model; the CAM to help focus on important feature channels, and the SAM to learn the importance of different spatial regions.

Results: Evaluation experiments on the widely-used BraTS 2018 and BraTS 2020 datasets demonstrated the effectiveness of the proposed CMAF-Net which achieved average Dice scores of 87.9%, 81.8%, and 64.3%, as well as Hausdorff distances of 4.21, 5.35, and 4.02 for whole tumor, tumor core, and enhancing tumor on the BraTS 2020 dataset, respectively, outperforming several state-of-the-art segmentation methods in missing modalities situations.

Conclusions: The experimental results show that the proposed CMAF-Net can achieve accurate brain tumor segmentation in the case of missing modalities with promising application potential.

Keywords: Brain tumor segmentation; cross-modal attention fusion (CMAF); magnetic resonance imaging (MRI); missing modalities; multimodal fusion.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-9/coif). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Example of data from the BraTS 2020 dataset. From left to right, the images represent 4 MRI modalities: T1, T1ce, T2, FLAIR, and the ground truth labels. In the ground truth image, the green region represents edema; the yellow region represents enhanced tumor; and the red region represents necrotic and non-enhancing tumor. T1, T1-weighted; T1ce, contrast-enhanced T1-weighted; T2, T2-weighted; FLAIR, fluid-attenuated inversion recovery; GT, ground truth; MRI, magnetic resonance imaging.
Figure 2
Figure 2
An overview of our proposed network architecture, including encoder stage, fusion stage, and decoder stage. 3D Swin block is used to capture global information. The CMAF block is used to fuse multimodal features. CAM, channel attention module; SAM, spatial attention module; CMAF, cross-modal attention fusion; T2, T2-weighted; T1, T1-weighted; T1ce, contrast-enhanced T1-weighted; FLAIR, fluid-attenuated inversion recovery; 3D, three-dimensional; F, input feature map; C, channels of the feature map; H, height of feature map; W, width of feature map; D, depth of feature map; CA, channel attention; Fc, feature map weighted by channel attention; Fs, feature map weighted by spatial attention; SA, spatial attention; V, value; K, key; Q, query.
Figure 3
Figure 3
The structure of 3D Swin block. MLP, multi-layer perceptron; 3D W-MSA, 3D windowed multi-head self-attention; 3D SW-MSA, 3D shifted window multi-head self-attention mechanism; 3D, three-dimensional.
Figure 4
Figure 4
The structure of CMAF Module. It consists of fusion block1 and fusion block2. CMAF, cross-modal attention fusion; V, value; K, key; Q, query; Conv Proj, convolutional projection; FLAIR, fluid-attenuated inversion recovery; T2, T2-weighted; T1, T1-weighted; T1ce, contrast-enhanced T1-weighted; FFN, feed forward network; MHA, multi-head attention.
Figure 5
Figure 5
Bar graph of the DSC of comparison experiments on the BraTS 2020 dataset. Error bars show standard error. DSC, dice similarity coefficient; ET, enhancing tumor; TC, tumor core; WT, whole tumor.
Figure 6
Figure 6
Comparison of segmentation results on four cases of missing modalities: complete modalities; FLAIR, T1ce, T2; FLAIR, T1ce; T1ce. From the left to right are 4 MRI modalities: T1, T2, FLAIR, and T1ce; the fifth column presents the ground truth of 2 patients, the sixth to eighth columns show the results of the state-of-the-art methods, and the rightmost column shows our segmentation results. T2, T2-weighted; FLAIR, fluid-attenuated inversion recovery; T1ce, contrast-enhanced T1-weighted; T1, T1-weighted; GT, ground truth; MRI, magnetic resonance imaging.
Figure 7
Figure 7
Examples of the segmentation results of CMAF-Net with various available modalities. FLAIR, fluid-attenuated inversion recovery; T1ce, contrast-enhanced T1-weighted; T2, T2-weighted; T1, T1-weighted; CMAF-Net, cross-modal attention fusion based deep neural network.
Figure 8
Figure 8
Boxplots and bar graph of the DSC of the comparison experiment results. (A) The results for 15 cases of missing modalities. (B) The average results of different methods and ours. Error bars show standard error. ET, enhancing tumor; TC, tumor core; WT, whole tumor; DSC, dice similarity coefficient.
Figure 9
Figure 9
Visual comparison of the effects of different components of CMAF-Net in the ablation study. T1ce, contrast-enhanced T1-weighted; T1, T1-weighted; FLAIR, fluid attenuated inversion recovery; T2, T2-weighted; 3D, three-dimensional; CMAF, cross-modal attention fusion; CAM, channel attention module; CMAF-Net, cross-modal attention fusion-based deep neural network; GT, ground truth.
Figure 10
Figure 10
Boxplots and bar graph of the DSC of ablation experiments for different modules. (A-C) The DSC of WT, TC, and ET, respectively. (D) The bar chart comparison result of different components. 3D, three-dimensional; CMAF, cross-modal attention fusion; CAM, channel attention module; CMAF-Net, cross-modal attention fusion based deep neural network; FLAIR, fluid attenuated inversion recovery; T1ce, contrast-enhanced T1-weighted; T2, T2-weighted; WT, whole tumor; TC, tumor core; ET, enhancing tumor; DSC, dice similarity coefficient.

Similar articles

Cited by

References

    1. Krishna PR, Prasad V, Battula TK. Optimization empowered hierarchical residual VGGNet19 network for multi-class brain tumour classification. Multimed Tools Appl 2023;82:16691-716.
    1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. 10.3322/caac.21660 - DOI - PubMed
    1. Zhu Z, He X, Qi G, Li Y, Cong B, Liu Y. Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI. Inf Fusion 2023;91:376-87.
    1. Rehman MU, Ryu J, Nizami IF, Chong KT. RAAGR2-Net: A brain tumor segmentation network using parallel processing of multiple spatial frames. Comput Biol Med 2023;152:106426. 10.1016/j.compbiomed.2022.106426 - DOI - PubMed
    1. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging 2015;34:1993-2024. 10.1109/TMI.2014.2377694 - DOI - PMC - PubMed

LinkOut - more resources