Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Dec 1;13(12):8747-8767.
doi: 10.21037/qims-23-542. Epub 2023 Oct 7.

Transformers in medical image segmentation: a narrative review

Affiliations
Review

Transformers in medical image segmentation: a narrative review

Rabeea Fatma Khan et al. Quant Imaging Med Surg. .

Abstract

Background and objective: Transformers, which have been widely recognized as state-of-the-art tools in natural language processing (NLP), have also come to be recognized for their value in computer vision tasks. With this increasing popularity, they have also been extensively researched in the more complex medical imaging domain. The associated developments have resulted in transformers being on par with sought-after convolution neural networks, particularly for medical image segmentation. Methods combining both types of networks have proven to be especially successful in capturing local and global contexts, thereby significantly boosting their performances in various segmentation problems. Motivated by this success, we have attempted to survey the consequential research focused on innovative transformer networks, specifically those designed to cater to medical image segmentation in an efficient manner.

Methods: Databases like Google Scholar, arxiv, ResearchGate, Microsoft Academic, and Semantic Scholar have been utilized to find recent developments in this field. Specifically, research in the English language from 2021 to 2023 was considered.

Key content and findings: In this survey, we look into the different types of architectures and attention mechanisms that uniquely improve performance and the structures that are in place to handle complex medical data. Through this survey, we summarize the popular and unconventional transformer-based research as seen through different key angles and analyze quantitatively the strategies that have proven more advanced.

Conclusions: We have also attempted to discern existing gaps and challenges within current research, notably highlighting the deficiency of annotated medical data for precise deep learning model training. Furthermore, potential future directions for enhancing transformers' utility in healthcare are outlined, encompassing strategies such as transfer learning and exploiting foundation models for specialized medical image segmentation.

Keywords: Transformers; artificial intelligence (AI); deep learning; image segmentation; medical imaging.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-23-542/coif). The authors have no conflict of interest to share.

Figures

Figure 1
Figure 1
Structure of a transformer. MHSA, multi-head self-attention.
Figure 2
Figure 2
Attention mechanism of a transformer. (A) Scaled dot product. (B) MHSA layer of a transformer. MHSA, multi-head self-attention.
Figure 3
Figure 3
Main categories for transformer-network classification. CNN, convolutional neural network.
Figure 4
Figure 4
Popular feature sub-space reduction techniques for 3D and 2D medical data. (A) Patch partitioning (B) Convolution layers to spatially reduce the feature dimensions. (C) Convolution layers and patch partition to significantly minimize the feature subspace. 3D, three-dimensional; 2D, two-dimensional.
Figure 5
Figure 5
Popular hierarchical encoder-decoder techniques involving transformers. (A) Transformer encoder-decoder. (B) Sequential network. (C) CNN encoder-decoder with transformer in bottleneck. (D) Interleaved CNN and transformer blocks within the encoder and decoder. (E) Transformer encoder with CNN decoder. (F) Parallel branches of CNN encoder and transformer encoder followed by a fusion module before the CNN decoder. CNN, convolutional neural network.

References

    1. Norouzi A, Rahim MS, Altameem A, Saba T, Rad AE, Rehman A, Uddin M. Medical image segmentation methods, algorithms, and applications. IETE Tech Rev 2014;31:199-213. 10.1080/02564602.2014.906861 - DOI
    1. Pham DL, Xu C, Prince JL. Current methods in medical image segmentation. Annu Rev Biomed Eng 2000;2:315-37. 10.1146/annurev.bioeng.2.1.315 - DOI - PubMed
    1. Kayalibay B, Jensen G, van der Smagt P. CNN-based segmentation of medical imaging data. arXiv:1701.03056 [Preprint]. 2017. Available online: https://arxiv.org/abs/1701.03056
    1. Li F, Zhou L, Wang Y, Chen C, Yang S, Shan F, Liu L. Modeling long-range dependencies for weakly supervised disease classification and localization on chest X-ray. Quant Imaging Med Surg 2022;12:3364-78. 10.21037/qims-21-1117 - DOI - PMC - PubMed
    1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017.