Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr:124:108452.
doi: 10.1016/j.patcog.2021.108452. Epub 2021 Nov 25.

Deep co-supervision and attention fusion strategy for automatic COVID-19 lung infection segmentation on CT images

Affiliations

Deep co-supervision and attention fusion strategy for automatic COVID-19 lung infection segmentation on CT images

Haigen Hu et al. Pattern Recognit. 2022 Apr.

Abstract

Due to the irregular shapes,various sizes and indistinguishable boundaries between the normal and infected tissues, it is still a challenging task to accurately segment the infected lesions of COVID-19 on CT images. In this paper, a novel segmentation scheme is proposed for the infections of COVID-19 by enhancing supervised information and fusing multi-scale feature maps of different levels based on the encoder-decoder architecture. To this end, a deep collaborative supervision (Co-supervision) scheme is proposed to guide the network learning the features of edges and semantics. More specifically, an Edge Supervised Module (ESM) is firstly designed to highlight low-level boundary features by incorporating the edge supervised information into the initial stage of down-sampling. Meanwhile, an Auxiliary Semantic Supervised Module (ASSM) is proposed to strengthen high-level semantic information by integrating mask supervised information into the later stage. Then an Attention Fusion Module (AFM) is developed to fuse multiple scale feature maps of different levels by using an attention mechanism to reduce the semantic gaps between high-level and low-level feature maps. Finally, the effectiveness of the proposed scheme is demonstrated on four various COVID-19 CT datasets. The results show that the proposed three modules are all promising. Based on the baseline (ResUnet), using ESM, ASSM, or AFM alone can respectively increase Dice metric by 1.12%, 1.95%,1.63% in our dataset, while the integration by incorporating three models together can rise 3.97%. Compared with the existing approaches in various datasets, the proposed method can obtain better segmentation performance in some main metrics, and can achieve the best generalization and comprehensive performance.

Keywords: Attention mechanism; COVID-19; Feature fusion; Multi-scale features; Semantic segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
An illustration of challenging task for identification the infected lesions (contours in red) of COVID-19 on CT images. (a) The infections have various scales and shapes. (b) There is no obvious difference between normal and infected tissues. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
An Illustration of the overall network architecture. The proposed architecture comprises of ASSM, ESM and AFM based on encoder-decoder structure. (1) ESM is used to further highlight the low-level features in the initial shallow layers of the encoder, and it can capture more detailed information like object boundaries. (2) While ASSM is employed to strengthen high-level semantic information by integrating object mask supervised information into the later stages of the encoder. (3) Finally, AFM is utilized to fuse multi-scale feature maps of different levels in the decoder.
Fig. 3
Fig. 3
An illustration of ESM and ASSM. Firstly, the low resolution feature maps from the stage Si are resized to the same size H×W with the input image by using bilinear interpolation up-sampling. Then all high resolution feature maps are reduced to a feature map by using 1×1 convolutions. Finally each pixel value of the obtained feature map is converted to a probability by using Sigmoid function σ(·), and the prediction image of the Si stage is obtained. (a) ESM: the edge supervision is achieved by comparing between the obtained edge prediction image Sedgei and the corresponding edge Ground Truth (GT) Gedge based on Eq.(1). (b) ASSM: the auxiliary semantic supervision is achieved by comparing between the obtained coarse segmented image Smaski and the corresponding Ground Truth (GT) of segmentation mask Gmask based on Eq.(2).
Fig. 4
Fig. 4
An illustration of the attention mechanism. Xiu represents the up-sampling intermediate result by bilinear interpolation for the feature map Xi, and its 2D size is the same size with the input image.
Fig. 5
Fig. 5
The procedure of the attention block. The color bar represents the trends of confidence values, and the red and blue denote 1 and 0, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Algorithm 1
Algorithm 1
Fusion algorithm.
Fig. 6
Fig. 6
Visual qualitative comparison of lung infection segmentation results among U-Net, PSPNet, DeepLabv3+, Inf-Net and the proposed method. Column 1: the original CT image; Column 2: U-Net; Column 3: PSPNet; Column 4: DeepLabv3+; Column 5: Inf-Net; Column 6: our method; Column 7: the corresponding ground truth (GT).
Fig. 7
Fig. 7
Visualization of each stage supervised by ESM. Column 1: the original CT image; Columns 2 to 6: S1 to S5; Column 7: the corresponding edge ground truth (GT).
Fig. 8
Fig. 8
Visual results of the fusion process based on the proposed AFM. Column 1: the original CT image; Column 2: the obtained confidence map P1; Column 3: the confidence map 1P1 of the lost detailed information; Column 4: the major result Y1 from the top feature map x1 through the attention block (AB); Column 5: the final prediction result Sp; Column 6: the corresponding ground truth (GT).

Similar articles

Cited by

References

    1. Wang C., Horby P.W., Hayden F.G., Gao G.F. A novel coronavirus outbreak of global health concern. Lancet. 2020;395(10223):470–473. - PMC - PubMed
    1. Huang C., Wang Y., et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497–506. - PMC - PubMed
    1. He K., Zhao W., Xie X., Ji W., Liu M., Tang Z., Shi Y., Shi F., Gao Y., Liu J., Zhang J., Shen D. Synergistic learning of lung lobe segmentation and hierarchical multi-instance classification for automated severity assessment of COVID-19 in CT images. Pattern Recognit. 2021;113:107828. - PMC - PubMed
    1. Rorat M., Jurek T., Simon K., Guziński M. Value of quantitative analysis in lung computed tomography in patients severely ill with COVID-19. PLoS One. 2021;16(5):e0251946. doi: 10.1371/journal.pone.0251946. - DOI - PMC - PubMed
    1. Amyar A., Modzelewski R., Li H., Ruan S. Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: classification and segmentation. Comput. Biol. Med. 2020;126:104037. - PMC - PubMed

LinkOut - more resources