. 2025 Jun 18;26(2):333-353.

doi: 10.17305/bb.2025.12378.

Unveiling etiology and mortality risks in community-acquired pneumonia: A machine learning approach

Alaa Ali¹, Ahmad R Alsayed¹, Nesrin Seder², Yazun Jarrar³, Raed H Altabanjeh¹, Mamoon Zihlif⁴, Osama Abu Ata⁵, Anas Samara⁶, Malek Zihlif⁷

Affiliations

¹ Department of Clinical Pharmacy and Therapeutics, Applied Science Private University, Amman, Jordan.
² Department of Pharmaceutical Chemistry and Pharmacognosy, Applied Science Private University, Amman, Jordan.
³ Department of Basic Medical Sciences, Faculty of Medicine, Al-Balqa Applied University, Al-Salt, Jordan.
⁴ Department of Internal Medicine, Section of Pulmonary, Islamic Hospital, Amman, Jordan.
⁵ Department of Internal Medicine, Section of Infectious Diseases, Islamic Hospital, Amman, Jordan.
⁶ Department of Software Engineering, Bethlehem University, Bethlehem, Palestine.
⁷ Department of Pharmacology, School of Medicine, The University of Jordan, Amman, Jordan.

PMID: 40613579
PMCID: PMC12505532
DOI: 10.17305/bb.2025.12378

Unveiling etiology and mortality risks in community-acquired pneumonia: A machine learning approach

Alaa Ali et al. Biomol Biomed. 2025.

. 2025 Jun 18;26(2):333-353.

doi: 10.17305/bb.2025.12378.

Authors

Alaa Ali¹, Ahmad R Alsayed¹, Nesrin Seder², Yazun Jarrar³, Raed H Altabanjeh¹, Mamoon Zihlif⁴, Osama Abu Ata⁵, Anas Samara⁶, Malek Zihlif⁷

Affiliations

¹ Department of Clinical Pharmacy and Therapeutics, Applied Science Private University, Amman, Jordan.
² Department of Pharmaceutical Chemistry and Pharmacognosy, Applied Science Private University, Amman, Jordan.
³ Department of Basic Medical Sciences, Faculty of Medicine, Al-Balqa Applied University, Al-Salt, Jordan.
⁴ Department of Internal Medicine, Section of Pulmonary, Islamic Hospital, Amman, Jordan.
⁵ Department of Internal Medicine, Section of Infectious Diseases, Islamic Hospital, Amman, Jordan.
⁶ Department of Software Engineering, Bethlehem University, Bethlehem, Palestine.
⁷ Department of Pharmacology, School of Medicine, The University of Jordan, Amman, Jordan.

PMID: 40613579
PMCID: PMC12505532
DOI: 10.17305/bb.2025.12378

Abstract

Community-acquired pneumonia (CAP) is associated with high mortality, and accurate diagnosis and risk prediction are essential for improving patient outcomes. Traditional diagnostic methods have limitations, prompting the use of machine learning (ML) to enhance diagnostic precision and treatment strategies. This study aims to develop ML models to predict CAP etiology and mortality using clinical data to enable early intervention. A retrospective cohort study was conducted on 251 adult CAP patients admitted to two Jordanian hospitals between March 2021 and February 2024. Various clinical data were analyzed using ML techniques, including linear regression, random forest, SHapley Additive exPlanations (SHAP), lasso regression, mutual information analysis, logistic regression, and correlation analysis. Key predictors of CAP survival included zinc, vitamin C, enoxaparin, and insulin bolus. Mutual information analysis identified neutrophils, alanine transaminase, mean corpuscular volume, hemoglobin, and platelets as significant mortality predictors, while lasso regression highlighted meropenem, arterial blood gases, PCO₂, and platelet count. Logistic regression confirmed intensive care unit (ICU) stay, pH, pulmonary severity index, white blood cell (WBC) count, and bicarbonate levels as crucial variables. Interestingly, lymphocyte count emerged as the strongest predictor of bacterial CAP, conflicting with established knowledge that associates neutrophils with bacterial infections. However, findings related to HCO₃, blood urea nitrogen, and WBC levels were consistent with clinical expectations. SHAP analysis highlighted basophils and fever as key predictors. Further investigation is needed to resolve conflicting findings and optimize predictive models. ML offers promising applications for CAP prognosis but requires refinement to address discrepancies and improve reliability in clinical decision-making.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest: Authors declare no conflicts of interest.

Figures

**Figure 1.**
**Overview of the six-step machine learning workflow for predicting outcome probabilities**.

**Figure 2.**
**Heatmap of strongly correlated features.** Pearson correlation coefficients: A heatmap of strongly correlated features. Warm colors represent higher correlations, while cool colors indicate negative correlations. Strong positive correlations are observed between inflammatory markers (e.g., C-reactive protein [CRP], ferritin) and between white blood cell (WBC) count and neutrophil count (r > 0.7). This pattern suggests potential collinearity among markers of systemic inflammation, which may affect model stability and motivate variable selection or regularization strategies.

**Figure 3.**
**Heatmap of top correlated features with mortality outcomes.** Pearson correlation coefficients: Heatmap of features and target outcome. Warm colors represent higher correlations, while cool colors indicate negative correlations. Top positive correlates: age (r ≈ 0.45), neutrophils (r ≈ 0.41), CRP (r ≈ 0.39), ferritin (r ≈ 0.36); top negative correlates: lymphocytes (r ≈ −0.42), oxygen saturation (r ≈ −0.38), hemoglobin (r ≈ −0.33). A focused panel also highlights high correlations for vitamin C, zinc, enoxaparin (CLEXAN), and insulin bolus. Abbreviation: LOS: Length of stay.

**Figure 4.**
**Mutual information of features related to mortality outcomes.** Mutual information scores quantify each feature's dependency on mortality, producing a ranked list of informative predictors. Top contributors include creatinine, WBC (including eosinophils), and neutrophil count; the number of previous hospitalizations also ranks highly. RDW and ALT are additional significant features. Overall, inflammatory markers—eosinophils, neutrophils, and basophils—show high informativeness, underscoring their value for model prioritization and clinical decision-making. Abbreviations: WBC: White blood cell count; RDW: Red blood celldistribution width; ALT: Alanine aminotransferase.

**Figure 5.**
**Top feature coefficients from Lasso regression.** Coefficients indicate each variable's direction and magnitude of association with mortality. “Culture” shows the largest positive coefficient (β ≈ 0.15) but is not clinically meaningful (test-performed indicator). Meropenem has a strong positive coefficient; ABG variables (pH, Base Excess, PCO₂) and platelet count also contribute positively, underscoring their relevance for risk prediction. Abbreviation: ICU: Intensive care unit.

**Figure 6.**
**Feature importance from logistic regression.** Coefficients indicate direction and strength. pH is the most impactful predictor with a strong negative coefficient (≈ −1.0). ICU_LOS and LOS show substantial positive effects (≈ 0.6 and ≈ 0.4), indicating longer stays are linked to higher mortality. Additional important contributors include PCO₂, albumin, WBC, bicarbonate (HCO₃), and ABG measures. Abbreviations: BUN: Blood urea nitrogen; WBC: White blood cell count; PSI: Pneumonia severity index.

**Figure 7.**
**SHAP summary plot: Analyzing the influence of medications and laboratory findings on model predictions.** Features are ranked by mean absolute SHAP values. Antibiotic use shows the strongest negative impact (SHAP < −0.6), with basophils and initial fever also lowering predicted risk. Enoxaparin and ciprofloxacin/piperacillin–tazobactam susceptibility align with lower risk, whereas meropenem/imipenem/amikacin susceptibility and tocilizumab, prednisolone, and anticoagulant use have positive impacts, likely reflecting greater disease severity. Abbreviations: HDL: High-density lipoprotein; SHAP: Shapley additive explanations.

**Figure 8.**
**Heatmap depicting correlations between clinical variables and bacterial infections.** Pearson correlation coefficients: Heatmap of features and target outcome. Warm colors represent higher correlations, while cool colors indicate negative correlations. The target was encoded as 1 for bacterial infection and 2 for no infection; thus higher coefficients indicate a greater likelihood of bacterial infection. This visualization supports ML-based etiology prediction in CAP. Abbreviations: SOB: Shortness of breath; CAP: Community-acquired pneumonia; ML: Machine learning.

**Figure 9.**
**Feature importance in logistic regression models.** The horizontal bar plot illustrates the coefficient of each feature, reflecting its actual contribution to the model’s predictions. A positive coefficient indicates a positive association with the target variable, while a negative coefficient signifies a negative association. Abbreviation: PCR: Polymerase chain reaction.

**Figure 10.**
**SHAP summary plot: Analyzing feature impact on model output.** This figure presents the SHAP values that demonstrate the influence of various features on the model’s predictions. Positive SHAP values indicate a beneficial contribution to the outcome, whereas negative values reflect a detrimental impact. The color gradient, ranging from blue (indicating low feature values) to red (indicating high feature values), underscores the relationship between feature magnitude and model output. Notably influential features include Antibiotics, Basophils, and initial findings of fever, each exhibiting distinct effects based on their respective values. Abbreviations: SHAP: Shapley additive explanations; PCR: Polymerase chain reaction.

**Figure 11.**
**Correlation of clinical and treatment features with bacterial infection as a primary outcome.** Antibiotics show the strongest positive correlation, with enoxaparin and anticoagulant use also positively associated; initial fever and basophils exhibit moderate positive correlations. In contrast, mAb tocilizumab and amikacin susceptibility show weak negative correlations, while other susceptibilities (e.g., cefepime, ciprofloxacin) are minimal or near zero. Abbreviation: HDL: High-density lipoprotein.

See this image and copyright information in PMC

References

1. Metlay JP, Waterer GW, Long AC, Anzueto A, Brozek J, Crothers K, et al. Diagnosis and treatment of adults with community-acquired pneumonia. An official clinical practice guideline of the American thoracic society and infectious diseases society of America. Am J Respir Crit Care Med. 2019;200(7):e45–67. https://doi.org/10.12746/swrccc.v8i33.625. - PMC - PubMed
1. Musher DM, Thorner AR. Community-acquired pneumonia. N Engl J Med. 2014;371(17):1619–28. https://doi.org/10.1056/NEJMra1312885. - PubMed
1. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347–58. https://doi.org/10.1056/NEJMra1814259. - PubMed
1. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–97. https://doi.org/10.1016/S2589-7500(19)30123-2. - PubMed
1. Kermany DS, Goldbaum M, Cai W, Valentim CC, Liang H, Baxter SL, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5):1122–31.e9. https://doi.org/10.1016/j.cell.2018.02.010. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Association of Basic Medical Sciences Federation of Bosnia and Herzegovina
- PubMed Central
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unveiling etiology and mortality risks in community-acquired pneumonia: A machine learning approach

Affiliations

Unveiling etiology and mortality risks in community-acquired pneumonia: A machine learning approach

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous