Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 22;23(2):e23390.
doi: 10.2196/23390.

Establishing Classifiers With Clinical Laboratory Indicators to Distinguish COVID-19 From Community-Acquired Pneumonia: Retrospective Cohort Study

Affiliations

Establishing Classifiers With Clinical Laboratory Indicators to Distinguish COVID-19 From Community-Acquired Pneumonia: Retrospective Cohort Study

Wanfa Dai et al. J Med Internet Res. .

Abstract

Background: The initial symptoms of patients with COVID-19 are very much like those of patients with community-acquired pneumonia (CAP); it is difficult to distinguish COVID-19 from CAP with clinical symptoms and imaging examination.

Objective: The objective of our study was to construct an effective model for the early identification of COVID-19 that would also distinguish it from CAP.

Methods: The clinical laboratory indicators (CLIs) of 61 COVID-19 patients and 60 CAP patients were analyzed retrospectively. Random combinations of various CLIs (ie, CLI combinations) were utilized to establish COVID-19 versus CAP classifiers with machine learning algorithms, including random forest classifier (RFC), logistic regression classifier, and gradient boosting classifier (GBC). The performance of the classifiers was assessed by calculating the area under the receiver operating characteristic curve (AUROC) and recall rate in COVID-19 prediction using the test data set.

Results: The classifiers that were constructed with three algorithms from 43 CLI combinations showed high performance (recall rate >0.9 and AUROC >0.85) in COVID-19 prediction for the test data set. Among the high-performance classifiers, several CLIs showed a high usage rate; these included procalcitonin (PCT), mean corpuscular hemoglobin concentration (MCHC), uric acid, albumin, albumin to globulin ratio (AGR), neutrophil count, red blood cell (RBC) count, monocyte count, basophil count, and white blood cell (WBC) count. They also had high feature importance except for basophil count. The feature combination (FC) of PCT, AGR, uric acid, WBC count, neutrophil count, basophil count, RBC count, and MCHC was the representative one among the nine FCs used to construct the classifiers with an AUROC equal to 1.0 when using the RFC or GBC algorithms. Replacing any CLI in these FCs would lead to a significant reduction in the performance of the classifiers that were built with them.

Conclusions: The classifiers constructed with only a few specific CLIs could efficiently distinguish COVID-19 from CAP, which could help clinicians perform early isolation and centralized management of COVID-19 patients.

Keywords: COVID-19; classification algorithm; classifier; clinical laboratory indicators; community-acquired pneumonia.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
The statistical distribution of the plasma level of the clinical laboratory indicators (CLIs) with a remarkable difference between COVID-19 and community-acquired pneumonia (CAP). The statistical distribution was presented with a box and whisker plot. The horizontal lines within the boxes indicate the median value. The vertical lines extending below and above the boxes represent 5%-95% percentile values. The scale on the y-axis represents the values of the 5th, 25th, 50th, 75th, and 95th percentiles of the CLI in the CAP group. The triangles represent the upper and lower limits of the normal reference range of the laboratory index. AFU: α-L-fucosidase; AGR: albumin to globulin ratio; ALB: albumin; BASOC: basophil count; EOC: eosinophil count; HGB: hemoglobin concentration; K: potassium; LYM: lymphocyte; MAO-B: monoaminoxidase B; MCHC: mean corpuscular hemoglobin concentration; mCRP: micro–C-reactive protein; MCV: mean (red blood cell) corpuscular volume; MOC: monocyte count; NEUT: neutrophil ratio; NEUTC: neutrophil count; PCT: procalcitonin; PCV: packed-cell volume (hematocrit); PT: prothrombin time; RBC: red blood cell count; RDW-SD: red blood cell distribution width–standard deviation; TT: thrombin time; UA: uric acid; WBC: white blood cell count.
Figure 2
Figure 2
Area under the receiver operating characteristic curve (AUROC) and precision-recall curve plotted for the COVID-19 vs community-acquired pneumonia (CAP) classifiers built with various feature combinations (FCs) of different clinical laboratory indicators (CLIs). At the top of each image is the CLI combination for constructing classifiers using three different classification algorithms. AFU: α-L-fucosidase; AGR: albumin to globulin ratio; ALB: albumin; BASOC: basophil count; EOC: eosinophil count; LYM: lymphocyte; MCHC: mean corpuscular hemoglobin concentration; MCV: mean (red blood cell) corpuscular volume; MOC: monocyte count; NEUTC: neutrophil count; PCT: procalcitonin; RBC: red blood cell count; UA: uric acid; WBC: white blood cell count.
Figure 3
Figure 3
Usage rate and the feature importance of each clinical laboratory indicator (CLI) in the high-performance COVID-19 vs community-acquired pneumonia (CAP) classifiers. (A) The mean feature importance of each CLI in the high-performance classifiers (HPCs) constructed with the 7-CLI combinations. (B) The mean feature importance of each CLI in the HPCs constructed with the 8-CLI combinations. The histogram is represented by mean (SD). The numbers with the shadow backgrounds represent the minimum and maximum values of the feature importance of the CLI. The number indicated with the triangle symbol represents the mean feature importance of CLI in all classifiers. The number indicated with the circle represents the usage rate of the CLI in the HPC. The number in the parentheses indicates how many CLI combinations are capable of constructing the HPCs containing the CLI. AFU: α-L-fucosidase; AGR: albumin to globulin ratio; ALB: albumin; BASOC: basophil count; EOC: eosinophil count; FC: feature combination; HGB: hemoglobin concentration; K: potassium; LYM: lymphocyte; MCHC: mean corpuscular hemoglobin concentration; MCV: mean (red blood cell) corpuscular volume; MOC: monocyte count; NEUT: neutrophil ratio; NEUTC: neutrophil count; PCT: procalcitonin; PCV: packed-cell volume (hematocrit); RBC: red blood cell count; RDW-SD: red blood cell distribution width–standard deviation; UA: uric acid; WBC: white blood cell count.

Similar articles

Cited by

References

    1. Coronavirus disease (COVID-19) pandemic. World Health Organization. 2020. [2020-05-06]. https://www.who.int/emergencies/diseases/novel-coronavirus-2019.
    1. Coronaviridae Study Group of the International Committee on Taxonomy of Viruses The species severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol. 2020 Apr;5(4):536–544. doi: 10.1038/s41564-020-0695-z. http://europepmc.org/abstract/MED/32123347 - DOI - PMC - PubMed
    1. Ye F, Xu S, Rong Z, Xu R, Liu X, Deng P, Liu H, Xu X. Delivery of infection from asymptomatic carriers of COVID-19 in a familial cluster. Int J Infect Dis. 2020 May;94:133–138. doi: 10.1016/j.ijid.2020.03.042. https://linkinghub.elsevier.com/retrieve/pii/S1201-9712(20)30174-0 - DOI - PMC - PubMed
    1. Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020 Apr 07;323(13):1239–1242. doi: 10.1001/jama.2020.2648. - DOI - PubMed
    1. Zhang J, Tian S, Lou J, Chen Y. Familial cluster of COVID-19 infection from an asymptomatic. Crit Care. 2020 Mar 27;24(1):119. doi: 10.1186/s13054-020-2817-7. https://ccforum.biomedcentral.com/articles/10.1186/s13054-020-2817-7 - DOI - DOI - PMC - PubMed