TransMed: Transformers Advance Multi-Modal Medical Image Classification

Yin Dai^{1

2}, Yifan Gao¹, Fayu Liu³

Affiliations

¹ College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China.
² Engineering Center on Medical Imaging and Intelligent Analysis, Ministry Education, Northeastern University, Shenyang 110169, China.
³ Department of Oromaxillofacial-Head and Neck Surgery, School of Stomatology, China Medical University, Shenyang 110002, China.

PMID: 34441318
PMCID: PMC8391808
DOI: 10.3390/diagnostics11081384

TransMed: Transformers Advance Multi-Modal Medical Image Classification

Yin Dai et al. Diagnostics (Basel). 2021.

. 2021 Jul 31;11(8):1384.

doi: 10.3390/diagnostics11081384.

Authors

Yin Dai^{1

2}, Yifan Gao¹, Fayu Liu³

Affiliations

¹ College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China.
² Engineering Center on Medical Imaging and Intelligent Analysis, Ministry Education, Northeastern University, Shenyang 110169, China.
³ Department of Oromaxillofacial-Head and Neck Surgery, School of Stomatology, China Medical University, Shenyang 110002, China.

PMID: 34441318
PMCID: PMC8391808
DOI: 10.3390/diagnostics11081384

Abstract

Over the past decade, convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks, such as disease classification, tumor segmentation, and lesion detection. CNN has great advantages in extracting local features of images. However, due to the locality of convolution operation, it cannot deal with long-range relationships well. Recently, transformers have been applied to computer vision and achieved remarkable success in large-scale datasets. Compared with natural images, multi-modal medical images have explicit and important long-range dependencies, and effective multi-modal fusion strategies can greatly improve the performance of deep models. This prompts us to study transformer-based structures and apply them to multi-modal medical images. Existing transformer-based network architectures require large-scale datasets to achieve better performance. However, medical imaging datasets are relatively small, which makes it difficult to apply pure transformers to medical image analysis. Therefore, we propose TransMed for multi-modal medical image classification. TransMed combines the advantages of CNN and transformer to efficiently extract low-level features of images and establish long-range dependencies between modalities. We evaluated our model on two datasets, parotid gland tumors classification and knee injury classification. Combining our contributions, we achieve an improvement of 10.1% and 1.9% in average accuracy, respectively, outperforming other state-of-the-art CNN-based models. The results of the proposed method are promising and have tremendous potential to be applied to a large number of medical image analysis tasks. To our best knowledge, this is the first work to apply transformers to multi-modal medical image classification.

Keywords: deep learning; medical image classification; multi-modal; multiparametric MRI; transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Compared with natural images, multi-modal medical images have more informative sequences.

**Figure 2**
Overview of TransMed, which is composed of CNN branch and transformer branch.

**Figure 3**
(a) Structure of the transformer. (b) Overview of self-attention, matmul means matrix product of two arrays. (c) An illustration of our multi-head self-attention component, concat means concatenate representations.

**Figure 4**
An illustration of the images in the PGT dataset. The yellow circle represents the location of the tumor.

**Figure 5**
Confusion matrix of TransMed-S on the PGT dataset.

See this image and copyright information in PMC

References

1. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention is All you Need; Proceedings of the Neural Information Processing Systems (NeurIPS); Long Beach, CA, USA. 4–9 December 2017; pp. 5998–6008.
1. Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S. End-to-end object detection with transformers; Proceedings of the European Conference on Computer Vision (ECCV); Glasgow, UK. 23–28 August 2020; pp. 213–229.
1. Zheng S., Lu J., Zhao H., Zhu X., Luo Z., Wang Y., Fu Y., Feng J., Xiang T., Torr P.H.S., et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. arXiv. 20202012.15840
1. Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv. 20202010.11929
1. Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jegou H. Training Data-Efficient Image Transformers & Distillation through Attention. arXiv. 20202012.12877

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TransMed: Transformers Advance Multi-Modal Medical Image Classification

Affiliations

TransMed: Transformers Advance Multi-Modal Medical Image Classification

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous