Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 4;24(1):98.
doi: 10.1186/s12886-024-03376-y.

Self-supervised pre-training for joint optic disc and cup segmentation via attention-aware network

Affiliations

Self-supervised pre-training for joint optic disc and cup segmentation via attention-aware network

Zhiwang Zhou et al. BMC Ophthalmol. .

Abstract

Image segmentation is a fundamental task in deep learning, which is able to analyse the essence of the images for further development. However, for the supervised learning segmentation method, collecting pixel-level labels is very time-consuming and labour-intensive. In the medical image processing area for optic disc and cup segmentation, we consider there are two challenging problems that remain unsolved. One is how to design an efficient network to capture the global field of the medical image and execute fast in real applications. The other is how to train the deep segmentation network using a few training data due to some medical privacy issues. In this paper, to conquer such issues, we first design a novel attention-aware segmentation model equipped with the multi-scale attention module in the pyramid structure-like encoder-decoder network, which can efficiently learn the global semantics and the long-range dependencies of the input images. Furthermore, we also inject the prior knowledge that the optic cup lies inside the optic disc by a novel loss function. Then, we propose a self-supervised contrastive learning method for optic disc and cup segmentation. The unsupervised feature representation is learned by matching an encoded query to a dictionary of encoded keys using a contrastive technique. Finetuning the pre-trained model using the proposed loss function can help achieve good performance for the task. To validate the effectiveness of the proposed method, extensive systemic evaluations on different public challenging optic disc and cup benchmarks, including DRISHTI-GS and REFUGE datasets demonstrate the superiority of the proposed method, which can achieve new state-of-the-art performance approaching 0.9801 and 0.9087 F1 score respectively while gaining 0.9657 D C disc and 0.8976 D C cup . The code will be made publicly available.

Keywords: Deep learning; Medical image processing; Optic disc and cup segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Visualization of the retinal fundus images and the corresponding OD and OC images
Fig. 2
Fig. 2
The overall architecture of the network. The given input image I is first fed into the encoder, yielding the multi-scale feature maps F. We employ the proposed multi-scale attention module followed by each convolutional layer for feature enhancement. Then, we inject the designed aggregation attention module followed by the last layer for feature fusion. The decoder is bridged behind the encoder in the pyramid-like structure for final mask prediction
Fig. 3
Fig. 3
Illustration of the proposed multi-scale attention module. For each query image token pixel, it will match with its top-K potentially corresponding tokens. Afterwards, it will be updated by aggregating different sub-region representations using the multi-layer perceptron operation
Fig. 4
Fig. 4
Illustration of the proposed aggregation attention module. The input tokens are first clustered into different groups. For each group, the self-attention operation is performed individually over the cluster centroid and cluster tokens. Ultimately, the updated cluster centroid and the group features are aggregated together to form a new feature vector
Fig. 5
Fig. 5
The framework of the proposed self-supervised method. An input image is augmented into two different views. Then the network learns to maximize agreement using a contrastive loss
Fig. 6
Fig. 6
The self-supervised training head for segmentation. The input image is first encoded by the network encoder. Then RoiAlign operation is applied to obtain a smaller global feature map for efficient learning. The final fully connected layer flattens the feature for contrastive learning
Fig. 7
Fig. 7
Visualizations of the optic disc and cup segmentation on REFUGE dataset and DRISHTI-GS dataset

Similar articles

References

    1. Tham YC, Li X, Wong TY, Quigley HA, Aung T, Cheng CY. Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. Ophthalmology. 2014;121(11):2081–2090. doi: 10.1016/j.ophtha.2014.05.013. - DOI - PubMed
    1. Liu S, Zhao H, Huang L, Ma C, Wang Q, Liu L. Vascular features around the optic disc in familial exudative vitreoretinopathy: findings and their relationship to disease severity. BMC Ophthalmol. 2023;23(1):1–11. doi: 10.1186/s12886-022-02764-6. - DOI - PMC - PubMed
    1. Chauhan BC, Burgoyne CF. From clinical examination of the optic disc to clinical assessment of the optic nerve head: a paradigm change. Am J Ophthalmol. 2013;156(2):218–227. doi: 10.1016/j.ajo.2013.04.016. - DOI - PMC - PubMed
    1. Drance S, Anderson DR, Schulzer M, Collaborative Normal-Tension Glaucoma Study Group, et al. Risk factors for progression of visual field abnormalities in normal-tension glaucoma. Am J Ophthalmol. 2001;131(6):699–708. - PubMed
    1. Hung KH, Kao YC, Tang YH, Chen YT, Wang CH, Wang YC, et al. Application of a deep learning system in glaucoma screening and further classification with colour fundus photographs: a case control study. BMC Ophthalmol. 2022;22(1):1–12. doi: 10.1186/s12886-022-02730-2. - DOI - PMC - PubMed

LinkOut - more resources