Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 5;12(4):390.
doi: 10.3390/bioengineering12040390.

SwinDAF3D: Pyramid Swin Transformers with Deep Attentive Features for Automated Finger Joint Segmentation in 3D Ultrasound Images for Rheumatoid Arthritis Assessment

Affiliations

SwinDAF3D: Pyramid Swin Transformers with Deep Attentive Features for Automated Finger Joint Segmentation in 3D Ultrasound Images for Rheumatoid Arthritis Assessment

Jianwei Qiu et al. Bioengineering (Basel). .

Abstract

Rheumatoid arthritis (RA) is a chronic autoimmune disease that can cause severe joint damage and functional impairment. Ultrasound imaging has shown promise in providing real-time assessment of synovium inflammation associated with the early stages of RA. Accurate segmentation of the synovium region and quantification of inflammation-specific imaging biomarkers are crucial for assessing and grading RA. However, automatic segmentation of the synovium in 3D ultrasound is challenging due to ambiguous boundaries, variability in synovium shape, and inhomogeneous intensity distribution. In this work, we introduce a novel network architecture, Swin Transformers with Deep Attentive Features for 3D segmentation (SwinDAF3D), which integrates Swin Transformers into a Deep Attentive Features framework. The developed architecture leverages the hierarchical structure and shifted windows of Swin Transformers to capture rich, multi-scale and attentive contextual information, improving the modeling of long-range dependencies and spatial hierarchies in 3D ultrasound images. In a six-fold cross-validation study with 3D ultrasound images of RA patients' finger joints (n = 72), our SwinDAF3D model achieved the highest performance with a Dice Score (DSC) of 0.838 ± 0.013, an Intersection over Union (IoU) of 0.719 ± 0.019, and Surface Dice Score (SDSC) of 0.852 ± 0.020, compared to 3D UNet (DSC: 0.742 ± 0.025; IoU: 0.589 ± 0.031; SDSC: 0.661 ± 0.029), DAF3D (DSC: 0.813 ± 0.017; IoU: 0.689 ± 0.022; SDSC: 0.817 ± 0.013), Swin UNETR (DSC: 0.808 ± 0.025; IoU: 0.678 ± 0.032; SDSC: 0.822 ± 0.039), UNETR++ (DSC: 0.810 ± 0.014; IoU: 0.684 ± 0.018; SDSC: 0.829 ± 0.027) and TransUNet (DSC: 0.818 ± 0.013; IoU: 0.692 ± 0.017; SDSC: 0.815 ± 0.016) models. This ablation study demonstrates the effectiveness of combining a Swin Transformers feature pyramid with a deep attention mechanism, improving the segmentation accuracy of the synovium in 3D ultrasound. This advancement shows great promise in enabling more efficient and standardized RA screening using ultrasound imaging.

Keywords: 3D ultrasound; automated 3D segmentation; deep attentive features; pyramid Swin transformers; rheumatoid arthritis.

PubMed Disclaimer

Conflict of interest statement

Qiu, J.; Karageorgos, G.; Ghose, S.; Dentinger, A. and Mills, D. are employees of GE HealthCare. Yang, Z. is an employee of GE Vernova.

Figures

Figure 1
Figure 1
Examples of US images of the synovium with corresponding segmentation overlays (green). The images illustrate the challenges posed by ambiguous boundaries, large variations in synovium shape, and inhomogeneous intensity distribution.
Figure 2
Figure 2
An example of a 3D manual sparse annotation (A) and its corresponding generated 3D dense annotation (B).
Figure 3
Figure 3
Three-dimensional UNet network architecture for synovium segmentation in 3D US images of finger joints. Conv: convolution; BN: batch normalization; ReLu: ReLu activation.
Figure 4
Figure 4
DAF3D network architecture for synovium segmentation in 3D US images of finger joints. FPN: Feature Pyramid Network; SLF: single-layer features; MLF: multi-layer features; ASPP: Atrous spatial pyramid pooling.
Figure 5
Figure 5
Swin UNETR network architecture for synovium segmentation in 3D US images of finger joints. MLP: Multi-layer Perceptron; W-MSA: window-based Multi-Head Self-Attention; SW-MSA is shifted window-based Multi-Head Self-Attention.
Figure 6
Figure 6
UNETR++ network architecture for synovium segmentation in 3D US images of finger joints. EPA: Efficient Paired-Attention.
Figure 7
Figure 7
TransUNet network architecture for synovium segmentation in 3D US images of finger joints. MLP: Multi-layer Perceptron.
Figure 8
Figure 8
SwinDAF3D network architecture for synovium segmentation in 3D US images of finger joints. FPN: Feature Pyramid Network; SLF: single-layer features; MLF: multi-layer features; ASPP: Atrous spatial pyramid pooling.
Figure 9
Figure 9
Training (A) and validation (B) Dice score accuracy curves (one-fold) over 50 epochs for all models in the ablation study: 3D UNet, DAF3D, Swin UNETR, UNETR++, TransUNet, and SwinDAF3D.
Figure 10
Figure 10
Example feature map visualizations demonstrating the efficacy of feature extraction by integrating Swin transformers feature pyramid (Swin FPN) with attention modules. (A) illustrates the DA3D single-layer features (SLFs) from levels 1 to 4 before and after applying attention modules, along with the final segmentation output. (B) shows the SwinDAF3D SLFs from levels 1 to 4 before and after applying attention modules, alongside the final segmentation output. We can observe that Swin FPN’s feature extraction provides superior feature representation at each level, and the attention module effectively refines these features.
Figure 11
Figure 11
Test set sample segmentation results from the 3D UNet, Swin UNETR, DAF3D, TransUNet, UNETR++, and SwinDAF3D models are compared against their corresponding ground truth annotations, with both segmentation and ground truth masks overlaid in green. (A) Ultrasound images with higher quality and larger synovium size; (B) challenging ultrasound images featuring a larger shadow, reduced contrast, and a smaller synovium.

Similar articles

References

    1. Aletaha D., Smolen J.S. Diagnosis and management of rheumatoid arthritis: A review. JAMA. 2018;320:1360–1372. doi: 10.1001/jama.2018.13103. - DOI - PubMed
    1. Dougados M., Devauchelle-Pensec V., François Ferlet J., D’Agostino M.A., Backhaus M., Bentin J., Chalès G., Chary-Valckenaere I., Conaghan P., Wakefield R.J., et al. The ability of synovitis to predict structural damage in rheumatoid arthritis: A comparative study between clinical examination and ultrasound. Ann. Rheum. Dis. 2013;72:665–671. doi: 10.1136/annrheumdis-2012-201469. - DOI - PMC - PubMed
    1. Visser H. Early diagnosis of rheumatoid arthritis. Best Pract. Res. Clin. Rheumatol. 2005;19:55–72. doi: 10.1016/j.berh.2004.08.005. - DOI - PubMed
    1. Naredo E., Collado P., Cruz A., Palop M.J., Cabero F., Richi P., Carmona L., Crespo M. Longitudinal power Doppler ultrasonographic assessment of joint inflammatory activity in early rheumatoid arthritis: Predictive value in disease activity and radiologic progression. Arthritis Care Res. Off. J. Am. Coll. Rheumatol. 2007;57:116–124. doi: 10.1002/art.22461. - DOI - PubMed
    1. Sudoł-Szopińska I., Jans L., Teh J. Rheumatoid arthritis: What do MRI and ultrasound show. J. Ultrason. 2017;17:5–16. - PMC - PubMed

LinkOut - more resources