. 2024 Jul 1;14(7):4579-4604.

doi: 10.21037/qims-24-9. Epub 2024 Jun 27.

CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation

Kangkang Sun^{1

2

3

4}, Jiangyi Ding^{2

3

4}, Qixuan Li^{2

3

4}, Wei Chen^{1

2

3

4}, Heng Zhang^{2

3

4}, Jiawei Sun^{2

3

4}, Zhuqing Jiao^#¹, Xinye Ni^#^{2

3

4}

Affiliations

¹ School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China.
² Department of Radiotherapy, The Affiliated of Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, China.
³ Jiangsu Province Engineering Research Center of Medical Physics, Changzhou, China.
⁴ Center of Medical Physics, Nanjing Medical University, Changzhou, China.

^# Contributed equally.

PMID: 39022265
PMCID: PMC11250309
DOI: 10.21037/qims-24-9

CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation

Kangkang Sun et al. Quant Imaging Med Surg. 2024.

. 2024 Jul 1;14(7):4579-4604.

doi: 10.21037/qims-24-9. Epub 2024 Jun 27.

Authors

Kangkang Sun^{1

2

3

4}, Jiangyi Ding^{2

3

4}, Qixuan Li^{2

3

4}, Wei Chen^{1

2

3

4}, Heng Zhang^{2

3

4}, Jiawei Sun^{2

3

4}, Zhuqing Jiao^#¹, Xinye Ni^#^{2

3

4}

Affiliations

¹ School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China.
² Department of Radiotherapy, The Affiliated of Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, China.
³ Jiangsu Province Engineering Research Center of Medical Physics, Changzhou, China.
⁴ Center of Medical Physics, Nanjing Medical University, Changzhou, China.

^# Contributed equally.

PMID: 39022265
PMCID: PMC11250309
DOI: 10.21037/qims-24-9

Abstract

Background: The information between multimodal magnetic resonance imaging (MRI) is complementary. Combining multiple modalities for brain tumor image segmentation can improve segmentation accuracy, which has great significance for disease diagnosis and treatment. However, different degrees of missing modality data often occur in clinical practice, which may lead to serious performance degradation or even failure of brain tumor segmentation methods relying on full-modality sequences to complete the segmentation task. To solve the above problems, this study aimed to design a new deep learning network for incomplete multimodal brain tumor segmentation.

Methods: We propose a novel cross-modal attention fusion-based deep neural network (CMAF-Net) for incomplete multimodal brain tumor segmentation, which is based on a three-dimensional (3D) U-Net architecture with encoding and decoding structure, a 3D Swin block, and a cross-modal attention fusion (CMAF) block. A convolutional encoder is initially used to extract the specific features from different modalities, and an effective 3D Swin block is constructed to model the long-range dependencies to obtain richer information for brain tumor segmentation. Then, a cross-attention based CMAF module is proposed that can deal with different missing modality situations by fusing features between different modalities to learn the shared representations of the tumor regions. Finally, the fused latent representation is decoded to obtain the final segmentation result. Additionally, channel attention module (CAM) and spatial attention module (SAM) are incorporated into the network to further improve the robustness of the model; the CAM to help focus on important feature channels, and the SAM to learn the importance of different spatial regions.

Results: Evaluation experiments on the widely-used BraTS 2018 and BraTS 2020 datasets demonstrated the effectiveness of the proposed CMAF-Net which achieved average Dice scores of 87.9%, 81.8%, and 64.3%, as well as Hausdorff distances of 4.21, 5.35, and 4.02 for whole tumor, tumor core, and enhancing tumor on the BraTS 2020 dataset, respectively, outperforming several state-of-the-art segmentation methods in missing modalities situations.

Conclusions: The experimental results show that the proposed CMAF-Net can achieve accurate brain tumor segmentation in the case of missing modalities with promising application potential.

Keywords: Brain tumor segmentation; cross-modal attention fusion (CMAF); magnetic resonance imaging (MRI); missing modalities; multimodal fusion.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-9/coif). The authors have no conflicts of interest to declare.

Figures

**Figure 1**
Example of data from the BraTS 2020 dataset. From left to right, the images represent 4 MRI modalities: T1, T1ce, T2, FLAIR, and the ground truth labels. In the ground truth image, the green region represents edema; the yellow region represents enhanced tumor; and the red region represents necrotic and non-enhancing tumor. T1, T1-weighted; T1ce, contrast-enhanced T1-weighted; T2, T2-weighted; FLAIR, fluid-attenuated inversion recovery; GT, ground truth; MRI, magnetic resonance imaging.

**Figure 2**
An overview of our proposed network architecture, including encoder stage, fusion stage, and decoder stage. 3D Swin block is used to capture global information. The CMAF block is used to fuse multimodal features. CAM, channel attention module; SAM, spatial attention module; CMAF, cross-modal attention fusion; T2, T2-weighted; T1, T1-weighted; T1ce, contrast-enhanced T1-weighted; FLAIR, fluid-attenuated inversion recovery; 3D, three-dimensional; F, input feature map; C, channels of the feature map; H, height of feature map; W, width of feature map; D, depth of feature map; CA, channel attention; Fc, feature map weighted by channel attention; Fs, feature map weighted by spatial attention; SA, spatial attention; V, value; K, key; Q, query.

**Figure 3**
The structure of 3D Swin block. MLP, multi-layer perceptron; 3D W-MSA, 3D windowed multi-head self-attention; 3D SW-MSA, 3D shifted window multi-head self-attention mechanism; 3D, three-dimensional.

**Figure 4**
The structure of CMAF Module. It consists of fusion block1 and fusion block2. CMAF, cross-modal attention fusion; V, value; K, key; Q, query; Conv Proj, convolutional projection; FLAIR, fluid-attenuated inversion recovery; T2, T2-weighted; T1, T1-weighted; T1ce, contrast-enhanced T1-weighted; FFN, feed forward network; MHA, multi-head attention.

**Figure 5**
Bar graph of the DSC of comparison experiments on the BraTS 2020 dataset. Error bars show standard error. DSC, dice similarity coefficient; ET, enhancing tumor; TC, tumor core; WT, whole tumor.

**Figure 6**
Comparison of segmentation results on four cases of missing modalities: complete modalities; FLAIR, T1ce, T2; FLAIR, T1ce; T1ce. From the left to right are 4 MRI modalities: T1, T2, FLAIR, and T1ce; the fifth column presents the ground truth of 2 patients, the sixth to eighth columns show the results of the state-of-the-art methods, and the rightmost column shows our segmentation results. T2, T2-weighted; FLAIR, fluid-attenuated inversion recovery; T1ce, contrast-enhanced T1-weighted; T1, T1-weighted; GT, ground truth; MRI, magnetic resonance imaging.

**Figure 7**
Examples of the segmentation results of CMAF-Net with various available modalities. FLAIR, fluid-attenuated inversion recovery; T1ce, contrast-enhanced T1-weighted; T2, T2-weighted; T1, T1-weighted; CMAF-Net, cross-modal attention fusion based deep neural network.

**Figure 8**
Boxplots and bar graph of the DSC of the comparison experiment results. (A) The results for 15 cases of missing modalities. (B) The average results of different methods and ours. Error bars show standard error. ET, enhancing tumor; TC, tumor core; WT, whole tumor; DSC, dice similarity coefficient.

**Figure 9**
Visual comparison of the effects of different components of CMAF-Net in the ablation study. T1ce, contrast-enhanced T1-weighted; T1, T1-weighted; FLAIR, fluid attenuated inversion recovery; T2, T2-weighted; 3D, three-dimensional; CMAF, cross-modal attention fusion; CAM, channel attention module; CMAF-Net, cross-modal attention fusion-based deep neural network; GT, ground truth.

**Figure 10**
Boxplots and bar graph of the DSC of ablation experiments for different modules. (A-C) The DSC of WT, TC, and ET, respectively. (D) The bar chart comparison result of different components. 3D, three-dimensional; CMAF, cross-modal attention fusion; CAM, channel attention module; CMAF-Net, cross-modal attention fusion based deep neural network; FLAIR, fluid attenuated inversion recovery; T1ce, contrast-enhanced T1-weighted; T2, T2-weighted; WT, whole tumor; TC, tumor core; ET, enhancing tumor; DSC, dice similarity coefficient.

See this image and copyright information in PMC

Cited by

FCFDiff-Net: full-conditional feature diffusion embedded network for 3D brain tumor segmentation.
Wu X, Hou Q, Xu Z, Tang C, Wang S, Sun J, Zhang Y. Wu X, et al. Quant Imaging Med Surg. 2025 May 1;15(5):4217-4234. doi: 10.21037/qims-24-2300. Epub 2025 Apr 25. Quant Imaging Med Surg. 2025. PMID: 40384687 Free PMC article.

References

1. Krishna PR, Prasad V, Battula TK. Optimization empowered hierarchical residual VGGNet19 network for multi-class brain tumour classification. Multimed Tools Appl 2023;82:16691-716.
1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. 10.3322/caac.21660 - DOI - PubMed
1. Zhu Z, He X, Qi G, Li Y, Cong B, Liu Y. Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI. Inf Fusion 2023;91:376-87.
1. Rehman MU, Ryu J, Nizami IF, Chong KT. RAAGR2-Net: A brain tumor segmentation network using parallel processing of multiple spatial frames. Comput Biol Med 2023;152:106426. 10.1016/j.compbiomed.2022.106426 - DOI - PubMed
1. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging 2015;34:1993-2024. 10.1109/TMI.2014.2377694 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- AME Publishing Company
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation

Affiliations

CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources