Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct:194:105532.
doi: 10.1016/j.cmpb.2020.105532. Epub 2020 May 8.

COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios

Affiliations

COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios

Rodolfo M Pereira et al. Comput Methods Programs Biomed. 2020 Oct.

Abstract

Background and objective: The COVID-19 can cause severe pneumonia and is estimated to have a high impact on the healthcare system. Early diagnosis is crucial for correct treatment in order to possibly reduce the stress in the healthcare system. The standard image diagnosis tests for pneumonia are chest X-ray (CXR) and computed tomography (CT) scan. Although CT scan is the gold standard, CXR are still useful because it is cheaper, faster and more widespread. This study aims to identify pneumonia caused by COVID-19 from other types and also healthy lungs using only CXR images.

Methods: In order to achieve the objectives, we have proposed a classification schema considering the following perspectives: i) a multi-class classification; ii) hierarchical classification, since pneumonia can be structured as a hierarchy. Given the natural data imbalance in this domain, we also proposed the use of resampling algorithms in the schema in order to re-balance the classes distribution. We observed that, texture is one of the main visual attributes of CXR images, our classification schema extract features using some well-known texture descriptors and also using a pre-trained CNN model. We also explored early and late fusion techniques in the schema in order to leverage the strength of multiple texture descriptors and base classifiers at once. To evaluate the approach, we composed a database, named RYDLS-20, containing CXR images of pneumonia caused by different pathogens as well as CXR images of healthy lungs. The classes distribution follows a real-world scenario in which some pathogens are more common than others.

Results: The proposed approach tested in RYDLS-20 achieved a macro-avg F1-Score of 0.65 using a multi-class approach and a F1-Score of 0.89 for the COVID-19 identification in the hierarchical classification scenario.

Conclusions: As far as we know, the top identification rate obtained in this paper is the best nominal rate obtained for COVID-19 identification in an unbalanced environment with more than three classes. We must also highlight the novel proposed hierarchical classification approach for this task, which considers the types of pneumonia caused by the different pathogens and lead us to the best COVID-19 recognition rate obtained here.

Keywords: COVID-19; Chest X-ray; Medical image analysis; Pneumonia; Texture.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
The hierarchical class structure of pneumonia caused by micro-organisms.
Fig. 2
Fig. 2
Different classes distribution in a binary labeled dataset.
Fig. 3
Fig. 3
Example of datasets before and after applying the resampling techniques.
Fig. 4
Fig. 4
The proposed classification schema for the COVID-19 identification in CXR images.
Fig. 5
Fig. 5
Example of combinations with late fusion strategies using the sum, product and voting strategies. The example dataset has M samples and L labels.
Fig. 6
Fig. 6
RYDLS-20 image samples.
Fig. 7
Fig. 7
F1-Score results per label in the best case scenario for multi-class context.
Fig. 8
Fig. 8
Confusion Matrix in the best case scenario for the multi-class experiments.
Fig. 9
Fig. 9
F1-Score results per label in the best case on the hierarchical scenario.
Fig. 10
Fig. 10
Confusion Matrix in the best case scenario for the hierarchical experiments.
Fig. 11
Fig. 11
Best F1-Score results on multi-class and hierarchical scenarios for COVID-19 Identification.
Fig. 12
Fig. 12
Best macro-avg F1-Score results on multi-class and hierarchical scenarios.
Fig. 13
Fig. 13
Examples of samples with “COVID-19” label that were predicted as “Normal”.
Fig. 14
Fig. 14
Different examples of CXR with “normal” lungs.

References

    1. Organization W.H. Situation Report 72. 2020. Coronavirus Disease 2019 (COVID-19)
    1. Guan W.-j., Ni Z.-y., Hu Y., Liang W.-h., Ou C.-q., He J.-x., Liu L., Shan H., Lei C.-l., Hui D.S. Clinical characteristics of coronavirus disease 2019 in china. N. Engl. J. Med. 2020 - PMC - PubMed
    1. Musher D.M., Thorner A.R. Community-acquired pneumonia. N. Engl. J. Med. 2014;371(17):1619–1628. - PubMed
    1. Tolksdorf K., Buda S., Schuler E., Wieler L.H., Haas W. Influenza-associated pneumonia as reference to assess seriousness of coronavirus disease (COVID-19) Eurosurveillance. 2020;25(11) - PMC - PubMed
    1. Grasselli G., Pesenti A., Cecconi M. Critical care utilization for the COVID-19 outbreak in lombardy, italy: early experience and forecast during an emergency response. JAMA. 2020 - PubMed