Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 29:12:e17005.
doi: 10.7717/peerj.17005. eCollection 2024.

Enhancing medical image segmentation with a multi-transformer U-Net

Affiliations

Enhancing medical image segmentation with a multi-transformer U-Net

Yongping Dan et al. PeerJ. .

Abstract

Various segmentation networks based on Swin Transformer have shown promise in medical segmentation tasks. Nonetheless, challenges such as lower accuracy and slower training convergence have persisted. To tackle these issues, we introduce a novel approach that combines the Swin Transformer and Deformable Transformer to enhance overall model performance. We leverage the Swin Transformer's window attention mechanism to capture local feature information and employ the Deformable Transformer to adjust sampling positions dynamically, accelerating model convergence and aligning it more closely with object shapes and sizes. By amalgamating both Transformer modules and incorporating additional skip connections to minimize information loss, our proposed model excels at rapidly and accurately segmenting CT or X-ray lung images. Experimental results demonstrate the remarkable, showcasing the significant prowess of our model. It surpasses the performance of the standalone Swin Transformer's Swin Unet and converges more rapidly under identical conditions, yielding accuracy improvements of 0.7% (resulting in 88.18%) and 2.7% (resulting in 98.01%) on the COVID-19 CT scan lesion segmentation dataset and Chest X-ray Masks and Labels dataset, respectively. This advancement has the potential to aid medical practitioners in early diagnosis and treatment decision-making.

Keywords: CT or X-ray lung images; Medical image segmentation; Multi-transformer; Unet.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Overall structure of the model.
The Swin Transformer and Deformable Transformer serve as the backbone network. Patch merging and patch expanding technologies are employed in the Encoder and Decoder, respectively, to modify the size of feature maps. Furthermore, the model incorporates additional skip connections to enhance multi-scale information fusion, ensuring the retention of crucial information.
Figure 2
Figure 2. Swin Transformer block.
The attention blocks with a movable window are composed of W-MSA and SW-MSA attention modules.
Figure 3
Figure 3. Deformable Attention block.
(A) This block is structured with a standard attention network architecture. (B) Deformable Attention introduces relative position deviation by incorporating an offset network to enhance the multi-head attention of the output. (C) Provides an overview of the detailed structure of the offset network.
Figure 4
Figure 4. Comparison of model segmentation accuracy.
The red section represents the prediction accuracy of our model, while the blue section represents the prediction accuracy of SwinUnet.
Figure 5
Figure 5. Automatic segmentation result.
The lung image is segmented automatically through the network.

Similar articles

Cited by

References

    1. Abedalla A, Abdullah M, Al-Ayyoub M, Benkhelifa E. Chest X-ray pneumothorax segmentation using U-Net with EfficientNet and ResNet architectures. PeerJ Computer Science. 2021;7:e607. doi: 10.7717/peerj-cs.607. - DOI - PMC - PubMed
    1. Adams R, Bischof L. Seeded region growing. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1994;16(6):641–647. doi: 10.1109/34.295913. - DOI
    1. Batra A, Singh S, Pang G, Basu S, Jawahar C, Paluri M. Improved road connectivity by joint learning of orientation and segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; Piscataway. 2019. pp. 10385–10393.
    1. Candemir S, Jaeger S, Palaniappan K, Musco JP, Singh RK, Xue Z, Karargyris A, Antani S, Thoma G, McDonald CJ. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE Transactions on Medical Imaging. 2013;33(2):577–590. doi: 10.1109/TMI.2013.2290491. - DOI - PMC - PubMed
    1. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M. Swin-unet: Unet-like pure transformer for medical image segmentation. European conference on computer vision; Cham. 2022. pp. 205–218.