. 2022 Jul 28:10:938801.

doi: 10.3389/fpubh.2022.938801. eCollection 2022.

Machine learning-assisted prediction of pneumonia based on non-invasive measures

Clement Yaw Effah¹, Ruoqi Miao¹, Emmanuel Kwateng Drokow², Clement Agboyibor³, Ruiping Qiao⁴, Yongjun Wu¹, Lijun Miao⁴, Yanbin Wang⁵

Affiliations

¹ College of Public Health, Zhengzhou University, Zhengzhou, China.
² Department of Radiation Oncology, Zhengzhou University People's Hospital, Henan Provincial People's Hospital, Zhengzhou, China.
³ School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China.
⁴ Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.
⁵ Center of Health Management, General Hospital of Anyang Iron and Steel Group Co., Ltd, Anyang, China.

PMID: 35968461
PMCID: PMC9371749
DOI: 10.3389/fpubh.2022.938801

Machine learning-assisted prediction of pneumonia based on non-invasive measures

Clement Yaw Effah et al. Front Public Health. 2022.

. 2022 Jul 28:10:938801.

doi: 10.3389/fpubh.2022.938801. eCollection 2022.

Authors

Clement Yaw Effah¹, Ruoqi Miao¹, Emmanuel Kwateng Drokow², Clement Agboyibor³, Ruiping Qiao⁴, Yongjun Wu¹, Lijun Miao⁴, Yanbin Wang⁵

Affiliations

¹ College of Public Health, Zhengzhou University, Zhengzhou, China.
² Department of Radiation Oncology, Zhengzhou University People's Hospital, Henan Provincial People's Hospital, Zhengzhou, China.
³ School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China.
⁴ Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.
⁵ Center of Health Management, General Hospital of Anyang Iron and Steel Group Co., Ltd, Anyang, China.

PMID: 35968461
PMCID: PMC9371749
DOI: 10.3389/fpubh.2022.938801

Abstract

Background: Pneumonia is an infection of the lungs that is characterized by high morbidity and mortality. The use of machine learning systems to detect respiratory diseases via non-invasive measures such as physical and laboratory parameters is gaining momentum and has been proposed to decrease diagnostic uncertainty associated with bacterial pneumonia. Herein, this study conducted several experiments using eight machine learning models to predict pneumonia based on biomarkers, laboratory parameters, and physical features.

Methods: We perform machine-learning analysis on 535 different patients, each with 45 features. Data normalization to rescale all real-valued features was performed. Since it is a binary problem, we categorized each patient into one class at a time. We designed three experiments to evaluate the models: (1) feature selection techniques to select appropriate features for the models, (2) experiments on the imbalanced original dataset, and (3) experiments on the SMOTE data. We then compared eight machine learning models to evaluate their effectiveness in predicting pneumonia.

Results: Biomarkers such as C-reactive protein and procalcitonin demonstrated the most significant discriminating power. Ensemble machine learning models such as RF (accuracy = 92.0%, precision = 91.3%, recall = 96.0%, f1-Score = 93.6%) and XGBoost (accuracy = 90.8%, precision = 92.6%, recall = 92.3%, f1-score = 92.4%) achieved the highest performance accuracy on the original dataset with AUCs of 0.96 and 0.97, respectively. On the SMOTE dataset, RF and XGBoost achieved the highest prediction results with f1-scores of 92.0 and 91.2%, respectively. Also, AUC of 0.97 was achieved for both RF and XGBoost models.

Conclusions: Our models showed that in the diagnosis of pneumonia, individual clinical history, laboratory indicators, and symptoms do not have adequate discriminatory power. We can also conclude that the ensemble ML models performed better in this study.

Keywords: decision support system (DSS); electronic health records (EHR); machine learning; non-invasive measures; pneumonia.

PubMed Disclaimer

Conflict of interest statement

Author YaW was employed by Center of Health Management, General Hospital of Anyang Iron and Steel Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 2**
Target data (LRTI) distribution before and after applying SMOTE. The label '0' is pneumonia and “1” for bronchitis. **(A)** Imbalanced data. **(B)** Balance data.

**Figure 3**
Confusion matrix of XGBoost and random forest on the original dataset. **(A)** XGBoost. **(B)** RF.

**Figure 4**
ROC curves of XGBoost and random forest on the original dataset. **(A)** XGBoost. **(B)** RF.

**Figure 5**
Feature importance according to XGBoost model on the original dataset.

**Figure 6**
Feature importance according to the RF model on the original dataset.

**Figure 7**
Confusion matrix of XGBoost and RF on SMOTE data. **(A)** XGBoost. **(B)** RF.

**Figure 8**
ROC curves of XGBoost and random forest on the SMOTE dataset. **(A)** XGBoost. **(B)** RF.

**Figure 9**
Feature importance according to the XGBoost model on the SMOTE dataset.

**Figure 10**
Feature importance according to the RF model on the SMOTE dataset.

**Figure 11**
Decision boundaries of the models on the original dataset.

**Figure 12**
Decision boundaries of the models on the balanced dataset.

**Figure 13**
AUROC curves for the external validation dataset. **(A)** XGBoost. **(B)** RF.

See this image and copyright information in PMC

References

1. O'Brien KL, Baggett HC, Brooks WA, Feikin DR, Hammitt LL, Higdon MM, et al. Causes of severe pneumonia requiring hospital admission in children without HIV infection from Africa and Asia: the PERCH multi-country case-control study. Lancet. (2019) 394:757–79. 10.1016/S0140-6736(19)30721-4 - DOI - PMC - PubMed
1. Peyrani P, Mandell L, Torres A, Tillotson GS. The burden of community-acquired bacterial pneumonia in the era of antibiotic resistance. Expert Rev Respir Med. (2019) 13:139–52. 10.1080/17476348.2019.1562339 - DOI - PubMed
1. Biscevic-Tokic J, Tokic N, Musanovic A. Pneumonia as the most common lower respiratory tract infection. Med Arch. (2013) 67:442. 10.5455/medarh.2013.67.442-445 - DOI - PMC - PubMed
1. Zanfardino M, Pane K, Mirabelli P, Salvatore M, Franzese M. TCGA-TCIA impact on radiogenomics cancer research: a systematic review. Int J Mol Sci. (2019) 20:6033. 10.3390/ijms20236033 - DOI - PMC - PubMed
1. World Health Organization pneumonia vaccine trial investigator' group . Standardization of Interpretation of Chest Radiographs for the Diagnosis of Pneumonia in Children. (2001). p. 1–39.

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning-assisted prediction of pneumonia based on non-invasive measures

Affiliations

Machine learning-assisted prediction of pneumonia based on non-invasive measures

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials