. 2024 Mar 4;24(1):98.

doi: 10.1186/s12886-024-03376-y.

Self-supervised pre-training for joint optic disc and cup segmentation via attention-aware network

Zhiwang Zhou¹, Yuanchang Zheng^{2

3}, Xiaoyu Zhou⁴, Jie Yu⁵, Shangjie Rong⁶

Affiliations

¹ Institute for Advanced Study, Nanchang University, Nanchang, 330031, China. zhiwangzhou@email.ncu.edu.cn.
² Institute for Advanced Study, Nanchang University, Nanchang, 330031, China.
³ Institute of Science and Technology, Waseda University, Tokyo, 63-8001, Japan.
⁴ School of Transportation Engineering, Tongji University, Shanghai, 200000, China.
⁵ School of Electrical Automation and Information Engineering, Tianjin University, Tianjin, 300000, China.
⁶ School of Mathematical Sciences, Xiamen University, Xiamen, 361000, China.

PMID: 38438876
PMCID: PMC10910696
DOI: 10.1186/s12886-024-03376-y

Self-supervised pre-training for joint optic disc and cup segmentation via attention-aware network

Zhiwang Zhou et al. BMC Ophthalmol. 2024.

. 2024 Mar 4;24(1):98.

doi: 10.1186/s12886-024-03376-y.

Authors

Zhiwang Zhou¹, Yuanchang Zheng^{2

3}, Xiaoyu Zhou⁴, Jie Yu⁵, Shangjie Rong⁶

Affiliations

¹ Institute for Advanced Study, Nanchang University, Nanchang, 330031, China. zhiwangzhou@email.ncu.edu.cn.
² Institute for Advanced Study, Nanchang University, Nanchang, 330031, China.
³ Institute of Science and Technology, Waseda University, Tokyo, 63-8001, Japan.
⁴ School of Transportation Engineering, Tongji University, Shanghai, 200000, China.
⁵ School of Electrical Automation and Information Engineering, Tianjin University, Tianjin, 300000, China.
⁶ School of Mathematical Sciences, Xiamen University, Xiamen, 361000, China.

PMID: 38438876
PMCID: PMC10910696
DOI: 10.1186/s12886-024-03376-y

Abstract

Image segmentation is a fundamental task in deep learning, which is able to analyse the essence of the images for further development. However, for the supervised learning segmentation method, collecting pixel-level labels is very time-consuming and labour-intensive. In the medical image processing area for optic disc and cup segmentation, we consider there are two challenging problems that remain unsolved. One is how to design an efficient network to capture the global field of the medical image and execute fast in real applications. The other is how to train the deep segmentation network using a few training data due to some medical privacy issues. In this paper, to conquer such issues, we first design a novel attention-aware segmentation model equipped with the multi-scale attention module in the pyramid structure-like encoder-decoder network, which can efficiently learn the global semantics and the long-range dependencies of the input images. Furthermore, we also inject the prior knowledge that the optic cup lies inside the optic disc by a novel loss function. Then, we propose a self-supervised contrastive learning method for optic disc and cup segmentation. The unsupervised feature representation is learned by matching an encoded query to a dictionary of encoded keys using a contrastive technique. Finetuning the pre-trained model using the proposed loss function can help achieve good performance for the task. To validate the effectiveness of the proposed method, extensive systemic evaluations on different public challenging optic disc and cup benchmarks, including DRISHTI-GS and REFUGE datasets demonstrate the superiority of the proposed method, which can achieve new state-of-the-art performance approaching 0.9801 and 0.9087 F1 score respectively while gaining 0.9657 $D C_{disc}$ and 0.8976 $D C_{cup}$ . The code will be made publicly available.

Keywords: Deep learning; Medical image processing; Optic disc and cup segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Visualization of the retinal fundus images and the corresponding OD and OC images

**Fig. 2**
The overall architecture of the network. The given input image I is first fed into the encoder, yielding the multi-scale feature maps F. We employ the proposed multi-scale attention module followed by each convolutional layer for feature enhancement. Then, we inject the designed aggregation attention module followed by the last layer for feature fusion. The decoder is bridged behind the encoder in the pyramid-like structure for final mask prediction

**Fig. 3**
Illustration of the proposed multi-scale attention module. For each query image token pixel, it will match with its top-K potentially corresponding tokens. Afterwards, it will be updated by aggregating different sub-region representations using the multi-layer perceptron operation

**Fig. 4**
Illustration of the proposed aggregation attention module. The input tokens are first clustered into different groups. For each group, the self-attention operation is performed individually over the cluster centroid and cluster tokens. Ultimately, the updated cluster centroid and the group features are aggregated together to form a new feature vector

**Fig. 5**
The framework of the proposed self-supervised method. An input image is augmented into two different views. Then the network learns to maximize agreement using a contrastive loss

**Fig. 6**
The self-supervised training head for segmentation. The input image is first encoded by the network encoder. Then RoiAlign operation is applied to obtain a smaller global feature map for efficient learning. The final fully connected layer flattens the feature for contrastive learning

**Fig. 7**
Visualizations of the optic disc and cup segmentation on REFUGE dataset and DRISHTI-GS dataset

See this image and copyright information in PMC

References

1. Tham YC, Li X, Wong TY, Quigley HA, Aung T, Cheng CY. Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. Ophthalmology. 2014;121(11):2081–2090. doi: 10.1016/j.ophtha.2014.05.013. - DOI - PubMed
1. Liu S, Zhao H, Huang L, Ma C, Wang Q, Liu L. Vascular features around the optic disc in familial exudative vitreoretinopathy: findings and their relationship to disease severity. BMC Ophthalmol. 2023;23(1):1–11. doi: 10.1186/s12886-022-02764-6. - DOI - PMC - PubMed
1. Chauhan BC, Burgoyne CF. From clinical examination of the optic disc to clinical assessment of the optic nerve head: a paradigm change. Am J Ophthalmol. 2013;156(2):218–227. doi: 10.1016/j.ajo.2013.04.016. - DOI - PMC - PubMed
1. Drance S, Anderson DR, Schulzer M, Collaborative Normal-Tension Glaucoma Study Group, et al. Risk factors for progression of visual field abnormalities in normal-tension glaucoma. Am J Ophthalmol. 2001;131(6):699–708. - PubMed
1. Hung KH, Kao YC, Tang YH, Chen YT, Wang CH, Wang YC, et al. Application of a deep learning system in glaucoma screening and further classification with colour fundus photographs: a case control study. BMC Ophthalmol. 2022;22(1):1–12. doi: 10.1186/s12886-022-02730-2. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

2022C046-3/Project of Development and Reform Commission of Jiangxi Province

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Self-supervised pre-training for joint optic disc and cup segmentation via attention-aware network

Affiliations

Self-supervised pre-training for joint optic disc and cup segmentation via attention-aware network

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources