Self-Supervised Learning Application on COVID-19 Chest X-ray Image Classification Using Masked AutoEncoder

Xin Xing^{1

2}, Gongbo Liang³, Chris Wang⁴, Nathan Jacobs⁵, Ai-Ling Lin^{2

6

7}

Affiliations

¹ Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA.
² Department of Radiology, University of Missouri, Columbia, MO 65212, USA.
³ Department of Computing and Cyber Security, Texas A&M University-San Antonio, San Antonio, TX 78224, USA.
⁴ Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.
⁵ Department of Computer Science & Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA.
⁶ Department of Biological Sciences, University of Missouri, Columbia, MO 65211, USA.
⁷ Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA.

PMID: 37627786
PMCID: PMC10451788
DOI: 10.3390/bioengineering10080901

Self-Supervised Learning Application on COVID-19 Chest X-ray Image Classification Using Masked AutoEncoder

Xin Xing et al. Bioengineering (Basel). 2023.

. 2023 Jul 29;10(8):901.

doi: 10.3390/bioengineering10080901.

Authors

Xin Xing^{1

2}, Gongbo Liang³, Chris Wang⁴, Nathan Jacobs⁵, Ai-Ling Lin^{2

6

7}

Affiliations

¹ Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA.
² Department of Radiology, University of Missouri, Columbia, MO 65212, USA.
³ Department of Computing and Cyber Security, Texas A&M University-San Antonio, San Antonio, TX 78224, USA.
⁴ Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.
⁵ Department of Computer Science & Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA.
⁶ Department of Biological Sciences, University of Missouri, Columbia, MO 65211, USA.
⁷ Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA.

PMID: 37627786
PMCID: PMC10451788
DOI: 10.3390/bioengineering10080901

Abstract

The COVID-19 pandemic has underscored the urgent need for rapid and accurate diagnosis facilitated by artificial intelligence (AI), particularly in computer-aided diagnosis using medical imaging. However, this context presents two notable challenges: high diagnostic accuracy demand and limited availability of medical data for training AI models. To address these issues, we proposed the implementation of a Masked AutoEncoder (MAE), an innovative self-supervised learning approach, for classifying 2D Chest X-ray images. Our approach involved performing imaging reconstruction using a Vision Transformer (ViT) model as the feature encoder, paired with a custom-defined decoder. Additionally, we fine-tuned the pretrained ViT encoder using a labeled medical dataset, serving as the backbone. To evaluate our approach, we conducted a comparative analysis of three distinct training methods: training from scratch, transfer learning, and MAE-based training, all employing COVID-19 chest X-ray images. The results demonstrate that MAE-based training produces superior performance, achieving an accuracy of 0.985 and an AUC of 0.9957. We explored the mask ratio influence on MAE and found ratio = 0.4 shows the best performance. Furthermore, we illustrate that MAE exhibits remarkable efficiency when applied to labeled data, delivering comparable performance to utilizing only 30% of the original training dataset. Overall, our findings highlight the significant performance enhancement achieved by using MAE, particularly when working with limited datasets. This approach holds profound implications for future disease diagnosis, especially in scenarios where imaging information is scarce.

Keywords: chest X-ray image; image classification; self-supervised learning; vision transformer (ViT).

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The visualization of the chest X-ray image of the COVIDxCXR-3. The first row shows the negative subjects and the second row shows the positive subjects. The input image size is 224 × 224. We normalize the image pixel from 0 to 255.

**Figure 2**
The structure of Vision Transformer. Sub-figure (a) illustrates the structure of the self-attention module. Sub-figure (b) shows the architecture of the Vision Transformer encoder.

**Figure 3**
The workflow of the MAE method on the COVID-19 classification task. There are two stages for MAE training. The first stage is the image reconstruction pretraining stage, with the ViT backbone as the image encoder. The second stage is a fine-tuning stage, with the ViT backbone as the feature extractor for the labeled images.

**Figure 4**
The block presentation of the MAE pipeline.

**Figure 5**
The AUC plot of the different training strategies for ViT model.

**Figure 6**
The AUC plot of the different mask ratios for ViT-MAE pretraining.

**Figure 7**
The AUC plot of the different percentages of training dataset for ViT-MAE model.

**Figure 8**
The visualization of the MAE for the image reconstruction pretraining. From the left column to the right are the original input image, the random masked image, and the reconstruction image. Even though the final reconstruction is not well-defined, the target of the pretraining stage is boosting the initial parameters of the ViT model.

See this image and copyright information in PMC

References

1. Tan M., Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks; Proceedings of the International Conference on Machine Learning; Long Beach, CA, USA. 9–15 June 2019; pp. 6105–6114.
1. Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. Densely connected convolutional networks; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA. 21–26 July 2017; pp. 4700–4708.
1. He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA. 26 June–1 July 2016; pp. 770–778.
1. Xing X., Peng C., Zhang Y., Lin A.L., Jacobs N. AssocFormer: Association Transformer for Multi-label Classification; Proceedings of the 33rd British Machine Vision Conference; London, UK. 21–24 November 2022.
1. Fu J., Liu J., Tian H., Li Y., Bao Y., Fang Z., Lu H. Dual attention network for scene segmentation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Long Beach, CA, USA. 15–20 June 2019; pp. 3146–3154.

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Self-Supervised Learning Application on COVID-19 Chest X-ray Image Classification Using Masked AutoEncoder

Affiliations

Self-Supervised Learning Application on COVID-19 Chest X-ray Image Classification Using Masked AutoEncoder

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources