Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 26;19(1):318.
doi: 10.1186/s12967-021-02992-2.

Self-supervised deep learning model for COVID-19 lung CT image segmentation highlighting putative causal relationship among age, underlying disease and COVID-19

Affiliations

Self-supervised deep learning model for COVID-19 lung CT image segmentation highlighting putative causal relationship among age, underlying disease and COVID-19

Daryl L X Fung et al. J Transl Med. .

Abstract

Background: Coronavirus disease 2019 (COVID-19) is very contagious. Cases appear faster than the available Polymerase Chain Reaction test kits in many countries. Recently, lung computerized tomography (CT) has been used as an auxiliary COVID-19 testing approach. Automatic analysis of the lung CT images is needed to increase the diagnostic efficiency and release the human participant. Deep learning is successful in automatically solving computer vision problems. Thus, it can be introduced to the automatic and rapid COVID-19 CT diagnosis. Many advanced deep learning-based computer vison techniques were developed to increase the model performance but have not been introduced to medical image analysis.

Methods: In this study, we propose a self-supervised two-stage deep learning model to segment COVID-19 lesions (ground-glass opacity and consolidation) from chest CT images to support rapid COVID-19 diagnosis. The proposed deep learning model integrates several advanced computer vision techniques such as generative adversarial image inpainting, focal loss, and lookahead optimizer. Two real-life datasets were used to evaluate the model's performance compared to the previous related works. To explore the clinical and biological mechanism of the predicted lesion segments, we extract some engineered features from the predicted lung lesions. We evaluate their mediation effects on the relationship of age with COVID-19 severity, as well as the relationship of underlying diseases with COVID-19 severity using statistic mediation analysis.

Results: The best overall F1 score is observed in the proposed self-supervised two-stage segmentation model (0.63) compared to the two related baseline models (0.55, 0.49). We also identified several CT image phenotypes that mediate the potential causal relationship between underlying diseases with COVID-19 severity as well as the potential causal relationship between age with COVID-19 severity.

Conclusions: This work contributes a promising COVID-19 lung CT image segmentation model and provides predicted lesion segments with potential clinical interpretability. The model could automatically segment the COVID-19 lesions from the raw CT images with higher accuracy than related works. The features of these lesions are associated with COVID-19 severity through mediating the known causal of the COVID-19 severity (age and underlying diseases).

Keywords: COVID-19; Image segmentation; Lung CT images; Mediation analysis; Self-supervised learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the proposed self-supervised COVID-19 lung infection segmentation (SSInfNet) model and statistic causal mediation analysis of the predicted segments. The black path shows the main workflow of the proposed two-stage SSInfNet model and the follow-up statistical mediation analysis. The first stage is a single SSInfNet which takes the damaged CT image as input, and outputs the reconstructed image (blue path), the edges of overall lesion segment (orange path), and the single segment itself. The inpainting loss and edge loss are intended to increase the complexity of the single SSInfNet to improve its segmentation ability. The coach network (presented in the blue path) forms a generative adversarial mechanism with single SSInfNet to further improve the later model’s performance. Continuing to proceed along the black path, the raw CT image and the predicted overall lesion segment (as prior) are used as input for the multi SSInfNet to further divide the overall lesion segments into ground-glass opacity and consolidation segments. Image inpainting is also involved in this stage (green path). For the multi segmentation, we use the focal technique as its loss function and lookahead optimizer as its training strategy. At the end, the predicted multi segments are used to extract several images features with Python’s PyRadiomics [41] package. The image features act as mediators in the mediation analysis model between the independent variables (age, gender, and underlying diseases) and the dependent variable (COVID-19 severity)
Fig. 2
Fig. 2
Segment visualization and data split. A Examples of raw lung CT images in both Med-seg dataset and ICTCF dataset. Images are all in the axial view which looks down through the body. B The overall lesion segment. This is the label for the proposed single self-supervised COVID-19 network (SSInfNet) model for lung infection segmentation, and it exists only in Med-seg dataset. C The ground-glass opacity segment (red) and consolidation segment (green). This is the label for multi SSInfNet and it is also only available for the Med-seg dataset. D The table shows the data utilization in the development of the proposed SSInfNet models. As ICTCF does not contain segment labels, it was used only for the self-supervised image inpainting in the training stage. The Med-seg image data was split into training, validation, and testing sets, approximately under the ratio of 6:1:1. After the model was well developed, it was applied to the ICTCF dataset for further statistic mediation analysis because only ICTCF contains COVID-19 clinical severity information, which means Med-seg data was not used in the mediation analysis
Fig. 3
Fig. 3
Visual comparison and quantitative comparison of segmentation results among different networks. A Four examples of the original lung CT images, their overall segments predicted by three different networks and the ground truth overall lesion annotation. The two baseline models are the single U-net and the single SInfNet (supervised COVID-19 lung infection segmentation) model. The proposed model is the single SSInfNet (self-supervised COVID-19 lung infection segmentation) model. B The mean and error of five quantitative model performance metrices calculated from the 35 test samples. C Three examples of the original lung CT images, their GGO and consolidation segments predicted by three different networks and the ground truth lesion annotations. The two baseline models are the multi U-net and the multi SInfNet models. The proposed model is the multi SSInfNet model. D The mean and error of five model performance metrics calculated from the 35 test samples. The Overall showed the averaged performance for GGO, consolidation, and background
Fig. 4
Fig. 4
Significant image phenotypes in the univariate mediation analyses. A Forest plot showing the 32 mediators of age’s indirect effect on COVID-19 risk. B Forest plot showing the 27 mediators of underlying disease’s indirect effect on COVID-19 risk
Fig. 5
Fig. 5
Hierarchical clustering of the ordered correlation matrix of the 37 image phenotypic mediators from the univariate mediation analysis. The color represents the correlation coefficient
Fig. 6
Fig. 6
Path plot of the mediation analysis model with multiple mediators. The standardized effect estimates of each variable are shown on the edges of the paths. The mediators are Entropy, Kurtosis, Skewness, Mean, Area, and IMC1. Dependent variable is the COVID-19 severity, and independent variables are the Underlying disease and Age. Curves with arrowheads on both sides is the standardized residual variance. Solid curve is for dependent variable and mediators. Dash line curve is for independent variables. Straight dash line represents the standardized covariance of two independent variables. Straight solid line with arrowhead on one side is the standardized effect estimate

References

    1. Disease outbreak news. WHO|Novel coronavirus—China. WHO. 2020 [cited 2020 Sep 22]. https://www.who.int/csr/don/12-january-2020-novel-coronavirus-china/en/
    1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20:533–534. doi: 10.1016/S1473-3099(20)30120-1. - DOI - PMC - PubMed
    1. Aleta A, Martín-Corral D, y Piontti AP, Ajelli M, Litvinova M, Chinazzi M, et al. Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19. Nat Hum Behav. 2020;4:964–971. doi: 10.1038/s41562-020-0931-9. - DOI - PMC - PubMed
    1. Harmon SA, Sanford TH, Xu S, Turkbey EB, Roth H, Xu Z, et al. Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nat Commun. 2020;11:1–7. doi: 10.1038/s41467-020-17971-2. - DOI - PMC - PubMed
    1. Gozes O, Frid-Adar M, Greenspan H, Browning PD, Zhang H, Ji W, et al. Rapid AI development cycle for the coronavirus (COVID-19) pandemic: initial results for automated detection & patient monitoring using deep learning CT image analysis. 2020. http://arxiv.org/abs/2003.05037

Publication types

LinkOut - more resources