. 2024 Dec 26;25(1):81.

doi: 10.3390/s25010081.

Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer

Naohiro Masuda¹, Keiko Ono², Daisuke Tawara³, Yusuke Matsuura⁴, Kentaro Sakabe¹

Affiliations

¹ Master's Program in Information and Computer Science, Doshisha University, Kyoto 610-0394, Japan.
² Department of Intelligent Information Engineering and Sciences, Doshisha University, Kyoto 610-0394, Japan.
³ Department of Advanced Science and Technology, Ryukoku University, Kyoto 520-2194, Japan.
⁴ Department of Orthopedic Surgery, Chiba University, Chiba 260-8677, Japan.

PMID: 39796872
PMCID: PMC11723075
DOI: 10.3390/s25010081

Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer

Naohiro Masuda et al. Sensors (Basel). 2024.

. 2024 Dec 26;25(1):81.

doi: 10.3390/s25010081.

Authors

Naohiro Masuda¹, Keiko Ono², Daisuke Tawara³, Yusuke Matsuura⁴, Kentaro Sakabe¹

Affiliations

¹ Master's Program in Information and Computer Science, Doshisha University, Kyoto 610-0394, Japan.
² Department of Intelligent Information Engineering and Sciences, Doshisha University, Kyoto 610-0394, Japan.
³ Department of Advanced Science and Technology, Ryukoku University, Kyoto 520-2194, Japan.
⁴ Department of Orthopedic Surgery, Chiba University, Chiba 260-8677, Japan.

PMID: 39796872
PMCID: PMC11723075
DOI: 10.3390/s25010081

Abstract

The semantic segmentation of bone structures demands pixel-level classification accuracy to create reliable bone models for diagnosis. While Convolutional Neural Networks (CNNs) are commonly used for segmentation, they often struggle with complex shapes due to their focus on texture features and limited ability to incorporate positional information. As orthopedic surgery increasingly requires precise automatic diagnosis, we explored SegFormer, an enhanced Vision Transformer model that better handles spatial awareness in segmentation tasks. However, SegFormer's effectiveness is typically limited by its need for extensive training data, which is particularly challenging in medical imaging, where obtaining labeled ground truths (GTs) is a costly and resource-intensive process. In this paper, we propose two models and their combination to enable accurate feature extraction from smaller datasets by improving SegFormer. Specifically, these include the data-efficient model, which deepens the hierarchical encoder by adding convolution layers to transformer blocks and increases feature map resolution within transformer blocks, and the FPN-based model, which enhances the decoder through a Feature Pyramid Network (FPN) and attention mechanisms. Testing our model on spine images from the Cancer Imaging Archive and our own hand and wrist dataset, ablation studies confirmed that our modifications outperform the original SegFormer, U-Net, and Mask2Former. These enhancements enable better image feature extraction and more precise object contour detection, which is particularly beneficial for medical imaging applications with limited training data.

Keywords: Mask2Former; SegFormer; feature pyramid network; semantic segmentation; transformer block.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure A3**
Hand and wrist segmentation.

**Figure 2**
Proposed model architecture.

**Figure 3**
Data-efficient encoder architecture.

**Figure 4**
Model-wise IoU for spine images.

**Figure 5**
Model-wise IoU for hand and Wrist images.

**Figure 6**
Model-wise IoU for femur images.

See this image and copyright information in PMC

References

1. Long J., Shelhamer E., Darrell T. Fully Convolutional Networks for Semantic Segmentation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Boston, MA, USA. 7–12 June 2015; pp. 3431–3440.
1. Badrinarayanan V., Kendall A., Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39:2481–2495. doi: 10.1109/TPAMI.2016.2644615. - DOI - PubMed
1. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation; Proceedings of the Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015; Munich, Germany. 5–9 October 2015; pp. 234–241.
1. Petit O., Thome N., Rambour C., Soler L. U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. arXiv. 20212103.06104
1. Fabian I., Jaeger P.F., Kohl S.A.A., Petersen J., Maier-Hein K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods. 2021;18:203–211. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

23H01308/JSPS KAKENHI

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer

Affiliations

Data-Efficient Bone Segmentation Using Feature Pyramid- Based SegFormer

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials