Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer
- PMID: 39796872
- PMCID: PMC11723075
- DOI: 10.3390/s25010081
Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer
Abstract
The semantic segmentation of bone structures demands pixel-level classification accuracy to create reliable bone models for diagnosis. While Convolutional Neural Networks (CNNs) are commonly used for segmentation, they often struggle with complex shapes due to their focus on texture features and limited ability to incorporate positional information. As orthopedic surgery increasingly requires precise automatic diagnosis, we explored SegFormer, an enhanced Vision Transformer model that better handles spatial awareness in segmentation tasks. However, SegFormer's effectiveness is typically limited by its need for extensive training data, which is particularly challenging in medical imaging, where obtaining labeled ground truths (GTs) is a costly and resource-intensive process. In this paper, we propose two models and their combination to enable accurate feature extraction from smaller datasets by improving SegFormer. Specifically, these include the data-efficient model, which deepens the hierarchical encoder by adding convolution layers to transformer blocks and increases feature map resolution within transformer blocks, and the FPN-based model, which enhances the decoder through a Feature Pyramid Network (FPN) and attention mechanisms. Testing our model on spine images from the Cancer Imaging Archive and our own hand and wrist dataset, ablation studies confirmed that our modifications outperform the original SegFormer, U-Net, and Mask2Former. These enhancements enable better image feature extraction and more precise object contour detection, which is particularly beneficial for medical imaging applications with limited training data.
Keywords: Mask2Former; SegFormer; feature pyramid network; semantic segmentation; transformer block.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures














Similar articles
-
MSA-MaxNet: Multi-Scale Attention Enhanced Multi-Axis Vision Transformer Network for Medical Image Segmentation.J Cell Mol Med. 2024 Dec;28(24):e70315. doi: 10.1111/jcmm.70315. J Cell Mol Med. 2024. PMID: 39706821 Free PMC article.
-
A dual-decoder banded convolutional attention network for bone segmentation in ultrasound images.Med Phys. 2025 Mar;52(3):1556-1572. doi: 10.1002/mp.17545. Epub 2024 Dec 9. Med Phys. 2025. PMID: 39651711
-
Transformer guided self-adaptive network for multi-scale skin lesion image segmentation.Comput Biol Med. 2024 Feb;169:107846. doi: 10.1016/j.compbiomed.2023.107846. Epub 2023 Dec 23. Comput Biol Med. 2024. PMID: 38184865
-
Efficient brain tumor segmentation using Swin transformer and enhanced local self-attention.Int J Comput Assist Radiol Surg. 2024 Feb;19(2):273-281. doi: 10.1007/s11548-023-03024-8. Epub 2023 Oct 5. Int J Comput Assist Radiol Surg. 2024. PMID: 37796413 Review.
-
Promises and perils of using Transformer-based models for SE research.Neural Netw. 2025 Apr;184:107067. doi: 10.1016/j.neunet.2024.107067. Epub 2024 Dec 24. Neural Netw. 2025. PMID: 39732064 Review.
References
-
- Long J., Shelhamer E., Darrell T. Fully Convolutional Networks for Semantic Segmentation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Boston, MA, USA. 7–12 June 2015; pp. 3431–3440.
-
- Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation; Proceedings of the Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015; Munich, Germany. 5–9 October 2015; pp. 234–241.
-
- Petit O., Thome N., Rambour C., Soler L. U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. arXiv. 20212103.06104
-
- Fabian I., Jaeger P.F., Kohl S.A.A., Petersen J., Maier-Hein K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods. 2021;18:203–211. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials