Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug:152:110489.
doi: 10.1016/j.patcog.2024.110489. Epub 2024 Apr 9.

Improving Image Segmentation with Contextual and Structural Similarity

Affiliations

Improving Image Segmentation with Contextual and Structural Similarity

Xiaoyang Chen et al. Pattern Recognit. 2024 Aug.

Abstract

Deep learning models for medical image segmentation are usually trained with voxel-wise losses, e.g., cross-entropy loss, focusing on unary supervision without considering inter-voxel relationships. This oversight potentially leads to semantically inconsistent predictions. Here, we propose a contextual similarity loss (CSL) and a structural similarity loss (SSL) to explicitly and efficiently incorporate inter-voxel relationships for improved performance. The CSL promotes consistency in predicted object categories for each image sub-region compared to ground truth. The SSL enforces compatibility between the predictions of voxel pairs by computing pair-wise distances between them, ensuring that voxels of the same class are close together whereas those from different classes are separated by a wide margin in the distribution space. The effectiveness of the CSL and SSL is evaluated using a clinical cone-beam computed tomography (CBCT) dataset of patients with various craniomaxillofacial (CMF) deformities and a public pancreas dataset. Experimental results show that the CSL and SSL outperform state-of-the-art regional loss functions in preserving segmentation semantics.

Keywords: Cone-beam computed tomography; Image segmentation; Inter-voxel relationships; Pancreas segmentation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest All authors declare that they have no conflicts of interest.

Figures

Figure 1:
Figure 1:
The network includes two main components: a base network (top) and a context encoding module (CEM; bottom). The base network consists of an encoder (left side) and a decoder (right side). In the encoder, successive Conv layers are used to reduce the spatial size of the feature maps and extract the feature representations for each sub-region of the input. The decoder gradually increases the spatial size of feature maps and finally outputs a full-sized probability map. The CEM is built upon the base network, employing the high-level features extracted by the encoder. In CEM, the existence of the object categories in each sub-region is predicted. The prediction is used to compute the contextual similarity loss (CSL). Both the cross-entropy loss (CE) and the structural similarity loss (SSL) are used respectively for unary supervision and structural similarity measurement based the probability map. The number of channels is indicated at the top of each box.
Figure 2:
Figure 2:
A 2D illustration of the contextual similarity loss (CSL). The label map (along with the input) is divided into multiple sub-regions (shown with different colors) and for each sub-region, the encoded ground truth (GT) represents what object categories are present in the sub-region (1 if an object category exists and 0 otherwise). Based on the extracted features for each sub-region, the network predicts a probability vector, which corresponds to the encoded GT obtained from the label map and represents the likelihood of the presence of all object categories. CSL promotes contextual similarity by minimizing the binary cross-entropy loss between the predictions and the encoded GT.
Figure 3:
Figure 3:
An illustration of the structural similarity loss (SSL). For simplicity, we denote the probability vector for each voxel with a cube. The integer on each cube denotes the ground truth label for the corresponding voxel. SSL performs pair-wise comparison and encourages structural similarity between predictions and ground truth by forcing the probability vectors in the distribution space to be close for voxels that have the same label (i.e., small KL divergence) and far (i.e., large KL divergence) for voxels that have different labels.
Figure 4:
Figure 4:
Segmentation performance in terms of DSC by tweaking the value of loss weight for the (a) CSL and (b) SSL.
Figure 5:
Figure 5:
Qualitative comparison results showing the effects of the proposed CSL and SSL in midface (1st row) and mandibular (2nd row) segmentation. (a) V-Net; (b) V-Net variant; (c) V-Net variant + CSL; (d) V-Net variant + SSL; (e) V-Net variant + CSL + SSL; (f) Ground truth. Bones indicated by arrows are where the misclassifications are most noticeable. CSL and SSL can help eliminating misclassifications, generating better segmentation results.

References

    1. Ambellan F, Tack A, Ehlke M, Zachow S, 2019. Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the osteoarthritis initiative. Medical Image Analysis 52, 109–118. - PubMed
    1. Chen H, Qi X, Yu L, Dou Q, Qin J, Heng PA, 2017a. DCAN: Deep contour-aware networks for object instance segmentation from histology images. Medical Image Analysis 36, 135–146. - PubMed
    1. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y, 2021. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306.
    1. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL, 2017b. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 834–848. - PubMed
    1. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL, 2018. Semantic image segmentation with deep convolutional nets and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 834–848. - PubMed