Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 29;26(1):419.
doi: 10.1186/s12891-025-08602-2.

Vision transformer-based diagnosis of lumbar disc herniation with grad-CAM interpretability in CT imaging

Affiliations

Vision transformer-based diagnosis of lumbar disc herniation with grad-CAM interpretability in CT imaging

Qingsong Chu et al. BMC Musculoskelet Disord. .

Abstract

Background: In this study, a computed tomography (CT)-vision transformer (ViT) framework for diagnosing lumbar disc herniation (LDH) was proposed for the first time by taking advantage of the multidirectional advantages of CT and a ViT.

Methods: The proposed ViT model was trained and validated on a dataset consisting of 983 patients, including 2100 CT images. We compared the performance of the ViT model with that of several convolutional neural networks (CNNs), including ResNet18, ResNet50, LeNet, AlexNet, and VGG16, across two primary tasks: vertebra localization and disc abnormality classification.

Results: The integration of a ViT with CT imaging allowed the constructed model to capture the complex spatial relationships and global dependencies within scans, outperforming CNN models and achieving accuracies of 97.13% and 93.63% in terms of vertebra localization and disc abnormality classification, respectively. The performance of the model was further validated via gradient-weighted class activation mapping (Grad-CAM), providing interpretable insights into the regions of the CT scans that contributed to the model predictions.

Conclusion: This study demonstrated the potential of a ViT for diagnosing LDH using CT imaging. The results highlight the promising clinical applications of this approach, particularly for enhancing the diagnostic efficiency and transparency of medical AI systems.

Keywords: CT; Deep learning; Diagnostic accuracy; Grad-CAM; LDH; Medical imaging; ViT.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Our study adhered to the Declaration of Helsinki. This study received approval from the Ethics Committee of the First Affiliated Hospital of Anhui University of Traditional Chinese Medicine (no. 2024MCZQ28), and the ethics committee waived the need to consent to participate because of the minimal risk involved. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Patient cohort of this study
Fig. 2
Fig. 2
a L3 - 4; b L4 - 5; c L5-S1 (from left to right: bulging, herniation and normal)
Fig. 3
Fig. 3
Schematic outline of the study
Fig. 4
Fig. 4
Architecture of a VIT
Fig. 5
Fig. 5
Confusion matrices: a ViT; b VGG16; c ResNet50; d ResNet18; e LeNet; f AlexNet; classes 0, 1, and 2 are L3-L4, L4-L5, and L5-S1, respectively
Fig. 6
Fig. 6
AUC curves: a ViT; b VGG16; c ResNet50; d ResNet18; e LeNet; f AlexNet; classes 0, 1, and 2 are L3-L4, L4-L5, and L5-S1, respectively
Fig. 7
Fig. 7
Confusion matrices: a ViT; b VGG16; c ResNet50; d ResNet18; e LeNet; f AlexNet; classes 0, 1, and 2 are bulging, herniation, and normal, respectively
Fig. 8
Fig. 8
AUC curves: a ViT; b VGG16; c ResNet50; d ResNet18; e LeNet; f AlexNet; classes 0, 1, and 2 are bulging, herniation, and normal, respectively
Fig. 9
Fig. 9
Combined confusion matrix for the localization and quantitative ViT models
Fig. 10
Fig. 10
Grad-CAM visualizations produced for predicting a real case of LDH: a location model (predicted class: L3-L4), b location model (predicted class: L4-L5), c location model (predicted class: L5-S1), d classification model (predicted class: normal), e classification model (predicted class: bulge), and f classification model (predicted class: herniation)
Fig. 11
Fig. 11
An interesting prediction

Similar articles

References

    1. Knezevic NN, Candido KD, Vlaeyen JWS, Van Zundert J, Cohen SP. Low back pain. Lancet. 2021;398(10294):78–92. - PubMed
    1. Lee JH, Choi KH, Kang S, Kim DH, Kim DH, Kim BR, et al. Non-surgical treatments for patients with radicular pain from lumbosacral disc herniation. Spine J. 2019;19(9):1478–89. - PubMed
    1. Schmid AB, Dove L, Ridgway L, Price C. Early surgery for sciatica. BMJ. 2023;381:791. - PMC - PubMed
    1. Kim J, van Rijn RM, van Tulder MW, Koes BW, de Boer MR, Ginai AZ, et al. Diagnostic accuracy of diagnostic imaging for lumbar disc herniation in adults with low back pain or sciatica is unknown; a systematic review. Chiropr Man Ther. 2018;26:1–14. - PMC - PubMed
    1. Wassenaar M, van Rijn RM, van Tulder MW, Verhagen AP, van der Windt DAWM, Koes BW, et al. Magnetic resonance imaging for diagnosing lumbar spinal pathology in adult patients with low back pain or sciatica: a diagnostic systematic review. Eur Spine J. 2012;21(2):220–7. - PMC - PubMed

MeSH terms