Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 28:10:938801.
doi: 10.3389/fpubh.2022.938801. eCollection 2022.

Machine learning-assisted prediction of pneumonia based on non-invasive measures

Affiliations

Machine learning-assisted prediction of pneumonia based on non-invasive measures

Clement Yaw Effah et al. Front Public Health. .

Abstract

Background: Pneumonia is an infection of the lungs that is characterized by high morbidity and mortality. The use of machine learning systems to detect respiratory diseases via non-invasive measures such as physical and laboratory parameters is gaining momentum and has been proposed to decrease diagnostic uncertainty associated with bacterial pneumonia. Herein, this study conducted several experiments using eight machine learning models to predict pneumonia based on biomarkers, laboratory parameters, and physical features.

Methods: We perform machine-learning analysis on 535 different patients, each with 45 features. Data normalization to rescale all real-valued features was performed. Since it is a binary problem, we categorized each patient into one class at a time. We designed three experiments to evaluate the models: (1) feature selection techniques to select appropriate features for the models, (2) experiments on the imbalanced original dataset, and (3) experiments on the SMOTE data. We then compared eight machine learning models to evaluate their effectiveness in predicting pneumonia.

Results: Biomarkers such as C-reactive protein and procalcitonin demonstrated the most significant discriminating power. Ensemble machine learning models such as RF (accuracy = 92.0%, precision = 91.3%, recall = 96.0%, f1-Score = 93.6%) and XGBoost (accuracy = 90.8%, precision = 92.6%, recall = 92.3%, f1-score = 92.4%) achieved the highest performance accuracy on the original dataset with AUCs of 0.96 and 0.97, respectively. On the SMOTE dataset, RF and XGBoost achieved the highest prediction results with f1-scores of 92.0 and 91.2%, respectively. Also, AUC of 0.97 was achieved for both RF and XGBoost models.

Conclusions: Our models showed that in the diagnosis of pneumonia, individual clinical history, laboratory indicators, and symptoms do not have adequate discriminatory power. We can also conclude that the ensemble ML models performed better in this study.

Keywords: decision support system (DSS); electronic health records (EHR); machine learning; non-invasive measures; pneumonia.

PubMed Disclaimer

Conflict of interest statement

Author YaW was employed by Center of Health Management, General Hospital of Anyang Iron and Steel Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Confusion matrix.
Figure 2
Figure 2
Target data (LRTI) distribution before and after applying SMOTE. The label '0' is pneumonia and “1” for bronchitis. (A) Imbalanced data. (B) Balance data.
Figure 3
Figure 3
Confusion matrix of XGBoost and random forest on the original dataset. (A) XGBoost. (B) RF.
Figure 4
Figure 4
ROC curves of XGBoost and random forest on the original dataset. (A) XGBoost. (B) RF.
Figure 5
Figure 5
Feature importance according to XGBoost model on the original dataset.
Figure 6
Figure 6
Feature importance according to the RF model on the original dataset.
Figure 7
Figure 7
Confusion matrix of XGBoost and RF on SMOTE data. (A) XGBoost. (B) RF.
Figure 8
Figure 8
ROC curves of XGBoost and random forest on the SMOTE dataset. (A) XGBoost. (B) RF.
Figure 9
Figure 9
Feature importance according to the XGBoost model on the SMOTE dataset.
Figure 10
Figure 10
Feature importance according to the RF model on the SMOTE dataset.
Figure 11
Figure 11
Decision boundaries of the models on the original dataset.
Figure 12
Figure 12
Decision boundaries of the models on the balanced dataset.
Figure 13
Figure 13
AUROC curves for the external validation dataset. (A) XGBoost. (B) RF.

Similar articles

Cited by

References

    1. O'Brien KL, Baggett HC, Brooks WA, Feikin DR, Hammitt LL, Higdon MM, et al. . Causes of severe pneumonia requiring hospital admission in children without HIV infection from Africa and Asia: the PERCH multi-country case-control study. Lancet. (2019) 394:757–79. 10.1016/S0140-6736(19)30721-4 - DOI - PMC - PubMed
    1. Peyrani P, Mandell L, Torres A, Tillotson GS. The burden of community-acquired bacterial pneumonia in the era of antibiotic resistance. Expert Rev Respir Med. (2019) 13:139–52. 10.1080/17476348.2019.1562339 - DOI - PubMed
    1. Biscevic-Tokic J, Tokic N, Musanovic A. Pneumonia as the most common lower respiratory tract infection. Med Arch. (2013) 67:442. 10.5455/medarh.2013.67.442-445 - DOI - PMC - PubMed
    1. Zanfardino M, Pane K, Mirabelli P, Salvatore M, Franzese M. TCGA-TCIA impact on radiogenomics cancer research: a systematic review. Int J Mol Sci. (2019) 20:6033. 10.3390/ijms20236033 - DOI - PMC - PubMed
    1. World Health Organization pneumonia vaccine trial investigator' group . Standardization of Interpretation of Chest Radiographs for the Diagnosis of Pneumonia in Children. (2001). p. 1–39.

Publication types