Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 25;15(1):14408.
doi: 10.1038/s41598-025-98518-7.

Vision transformer and deep learning based weighted ensemble model for automated spine fracture type identification with GAN generated CT images

Affiliations

Vision transformer and deep learning based weighted ensemble model for automated spine fracture type identification with GAN generated CT images

Sindhura D N et al. Sci Rep. .

Abstract

The most common causes of spine fractures, or vertebral column fractures (VCF), are traumas like falls, injuries from sports, or accidents. CT scans are affordable and effective at detecting VCF types in an accurate manner. VCF type identification in cervical, thoracic, and lumbar (C3-L5) regions is limited and sensitive to inter-observer variability. To solve this problem, this work introduces an autonomous approach for identifying VCF type by developing a novel ensemble model of Vision Transformers (ViT) and best-performing deep learning (DL) models. It assists orthopaedicians in easy and early identification of VCF types. The performance of numerous fine-tuned DL architectures, including VGG16, ResNet50, and DenseNet121, was investigated, and an ensemble classification model was developed to identify the best-performing combination of DL models. A ViT model is also trained to identify VCF. Later, the best-performing DL models and ViT were fused by weighted average technique for type identification. To overcome data limitations, an extended Deep Convolutional Generative Adversarial Network (DCGAN) and Progressive Growing Generative Adversarial Network (PGGAN) were developed. The VGG16-ResNet50-ViT ensemble model outperformed all ensemble models and got an accuracy of 89.98%. Extended DCGAN and PGGAN augmentation increased the accuracy of type identification to 90.28% and 93.68%, respectively. This demonstrates efficacy of PGGANs in augmenting VCF images. The study emphasizes the distinctive contributions of the ResNet50, VGG16, and ViT models in feature extraction, generalization, and global shape-based pattern capturing in VCF type identification. CT scans collected from a tertiary care hospital are used to validate these models.

Keywords: Computed tomography (CT) images; Deep convolutional generative adversarial network (DCGAN); Deep learning (DL); Ensemble deep learning model; Generative adversarial networks (GANs); Progressive growing generative adversarial network (PGGAN); Spine fracture; Vision transformer (ViT).

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethical clearance and informed consent statement: This retrospective study has been approved by the Institutional Ethical Committee of Kasturba Medical College, Manipal, Manipal Academy of Higher Education, Manipal, Karnataka, India with IEC number (IEC: 503/2020). Due to the retrospective nature of the study, the informed consent was waived and was approved by the Institutional Ethical Committee of Kasturba Medical College, Manipal, Manipal Academy of Higher Education, Manipal, Karnataka, India. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Collected VCF CT dataset description (n = 2820).
Fig. 2
Fig. 2
Sample VCF CT images from the collected dataset.
Fig. 3
Fig. 3
Block diagram for VCF type identification.
Fig. 4
Fig. 4
The DL based ensemble model architecture for VCF type identification.
Fig. 5
Fig. 5
Weighted average ensemble model architecture for VCF classification.
Fig. 6
Fig. 6
Accuracy comparison of different ensemble models in VCF type identification.
Fig. 7
Fig. 7
Recall comparison of different ensemble models in VCF type identification.
Fig. 8
Fig. 8
F1score comparison of different ensemble models in VCF type identification.
Fig. 9
Fig. 9
Clinical evaluation of extended DCGAN and PGGAN generated VCF images.

References

    1. Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng.19, 221–248 (2017). - PMC - PubMed
    1. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Commun. ACM60, 84–90 (2017).
    1. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale (2021).
    1. Zheng, S. et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 6881–6890 (2021).
    1. Li, G., Zhu, L., Liu, P. & Yang, Y. Entangled transformer for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision 8928–8937 (2019).

MeSH terms