Deep co-supervision and attention fusion strategy for automatic COVID-19 lung infection segmentation on CT images

Haigen Hu^{1

2}, Leizhao Shen^{1

2}, Qiu Guan^{1

2}, Xiaoxin Li^{1

2}, Qianwei Zhou^{1

2}, Su Ruan³

Affiliations

¹ College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, PR China.
² Key Laboratory of Visual Media Intelligent Processing Technology of Zhejiang Province, Hangzhou 310023, PR China.
³ University of Rouen Normandy, LITIS EA 4108, Rouen 76183, France.

PMID: 34848897
PMCID: PMC8612757
DOI: 10.1016/j.patcog.2021.108452

Deep co-supervision and attention fusion strategy for automatic COVID-19 lung infection segmentation on CT images

Haigen Hu et al. Pattern Recognit. 2022 Apr.

. 2022 Apr:124:108452.

doi: 10.1016/j.patcog.2021.108452. Epub 2021 Nov 25.

Authors

Haigen Hu^{1

2}, Leizhao Shen^{1

2}, Qiu Guan^{1

2}, Xiaoxin Li^{1

2}, Qianwei Zhou^{1

2}, Su Ruan³

Affiliations

¹ College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, PR China.
² Key Laboratory of Visual Media Intelligent Processing Technology of Zhejiang Province, Hangzhou 310023, PR China.
³ University of Rouen Normandy, LITIS EA 4108, Rouen 76183, France.

PMID: 34848897
PMCID: PMC8612757
DOI: 10.1016/j.patcog.2021.108452

Abstract

Due to the irregular shapes,various sizes and indistinguishable boundaries between the normal and infected tissues, it is still a challenging task to accurately segment the infected lesions of COVID-19 on CT images. In this paper, a novel segmentation scheme is proposed for the infections of COVID-19 by enhancing supervised information and fusing multi-scale feature maps of different levels based on the encoder-decoder architecture. To this end, a deep collaborative supervision (Co-supervision) scheme is proposed to guide the network learning the features of edges and semantics. More specifically, an Edge Supervised Module (ESM) is firstly designed to highlight low-level boundary features by incorporating the edge supervised information into the initial stage of down-sampling. Meanwhile, an Auxiliary Semantic Supervised Module (ASSM) is proposed to strengthen high-level semantic information by integrating mask supervised information into the later stage. Then an Attention Fusion Module (AFM) is developed to fuse multiple scale feature maps of different levels by using an attention mechanism to reduce the semantic gaps between high-level and low-level feature maps. Finally, the effectiveness of the proposed scheme is demonstrated on four various COVID-19 CT datasets. The results show that the proposed three modules are all promising. Based on the baseline (ResUnet), using ESM, ASSM, or AFM alone can respectively increase Dice metric by 1.12%, 1.95%,1.63% in our dataset, while the integration by incorporating three models together can rise 3.97%. Compared with the existing approaches in various datasets, the proposed method can obtain better segmentation performance in some main metrics, and can achieve the best generalization and comprehensive performance.

Keywords: Attention mechanism; COVID-19; Feature fusion; Multi-scale features; Semantic segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1**
An illustration of challenging task for identification the infected lesions (contours in red) of COVID-19 on CT images. (a) The infections have various scales and shapes. (b) There is no obvious difference between normal and infected tissues. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

**Fig. 2**
An Illustration of the overall network architecture. The proposed architecture comprises of ASSM, ESM and AFM based on encoder-decoder structure. (1) ESM is used to further highlight the low-level features in the initial shallow layers of the encoder, and it can capture more detailed information like object boundaries. (2) While ASSM is employed to strengthen high-level semantic information by integrating object mask supervised information into the later stages of the encoder. (3) Finally, AFM is utilized to fuse multi-scale feature maps of different levels in the decoder.

**Fig. 3**
An illustration of ESM and ASSM. Firstly, the low resolution feature maps from the stage $S_{i}$ are resized to the same size $H \times W$ with the input image by using bilinear interpolation up-sampling. Then all high resolution feature maps are reduced to a feature map by using $1 \times 1$ convolutions. Finally each pixel value of the obtained feature map is converted to a probability by using Sigmoid function $σ (\cdot)$ , and the prediction image of the $S_{i}$ stage is obtained. (a) ESM: the edge supervision is achieved by comparing between the obtained edge prediction image $S_{e d g e}^{i}$ and the corresponding edge Ground Truth (GT) $G_{e d g e}$ based on Eq.(1). (b) ASSM: the auxiliary semantic supervision is achieved by comparing between the obtained coarse segmented image $S_{m a s k}^{i}$ and the corresponding Ground Truth (GT) of segmentation mask $G_{m a s k}$ based on Eq.(2).

**Fig. 4**
An illustration of the attention mechanism. $X_{i}^{u}$ represents the up-sampling intermediate result by bilinear interpolation for the feature map $X_{i}$ , and its 2D size is the same size with the input image.

**Fig. 5**
The procedure of the attention block. The color bar represents the trends of confidence values, and the red and blue denote 1 and 0, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

**Fig. 6**
Visual qualitative comparison of lung infection segmentation results among U-Net, PSPNet, DeepLabv3+, Inf-Net and the proposed method. Column 1: the original CT image; Column 2: U-Net; Column 3: PSPNet; Column 4: DeepLabv3+; Column 5: Inf-Net; Column 6: our method; Column 7: the corresponding ground truth (GT).

**Fig. 7**
Visualization of each stage supervised by ESM. Column 1: the original CT image; Columns 2 to 6: $S_{1}$ to $S_{5}$ ; Column 7: the corresponding edge ground truth (GT).

**Fig. 8**
Visual results of the fusion process based on the proposed AFM. Column 1: the original CT image; Column 2: the obtained confidence map $P_{1}$ ; Column 3: the confidence map $1 - P_{1}$ of the lost detailed information; Column 4: the major result $Y_{1}$ from the top feature map $x_{1}$ through the attention block (AB); Column 5: the final prediction result $S_{p}$ ; Column 6: the corresponding ground truth (GT).

See this image and copyright information in PMC

References

1. Wang C., Horby P.W., Hayden F.G., Gao G.F. A novel coronavirus outbreak of global health concern. Lancet. 2020;395(10223):470–473. - PMC - PubMed
1. Huang C., Wang Y., et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497–506. - PMC - PubMed
1. He K., Zhao W., Xie X., Ji W., Liu M., Tang Z., Shi Y., Shi F., Gao Y., Liu J., Zhang J., Shen D. Synergistic learning of lung lobe segmentation and hierarchical multi-instance classification for automated severity assessment of COVID-19 in CT images. Pattern Recognit. 2021;113:107828. - PMC - PubMed
1. Rorat M., Jurek T., Simon K., Guziński M. Value of quantitative analysis in lung computed tomography in patients severely ill with COVID-19. PLoS One. 2021;16(5):e0251946. doi: 10.1371/journal.pone.0251946. - DOI - PMC - PubMed
1. Amyar A., Modzelewski R., Li H., Ruan S. Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: classification and segmentation. Comput. Biol. Med. 2020;126:104037. - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep co-supervision and attention fusion strategy for automatic COVID-19 lung infection segmentation on CT images

Affiliations

Deep co-supervision and attention fusion strategy for automatic COVID-19 lung infection segmentation on CT images

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous