Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 3:82:103192.
doi: 10.1016/j.eclinm.2025.103192. eCollection 2025 Apr.

An interpretable machine learning model based on optimal feature selection for identifying CT abnormalities in patients with mild traumatic brain injury

Affiliations

An interpretable machine learning model based on optimal feature selection for identifying CT abnormalities in patients with mild traumatic brain injury

Yuling Pan et al. EClinicalMedicine. .

Abstract

Background: Minor head trauma is a frequent cause of emergency department visits, early identification and prediction of mild traumatic brain injury (mTBI) patients with abnormal brain lesions are vital for minimizing unnecessary computed tomography (CT) scans, reducing radiation exposure, and ensuring timely effective treatment and care. This study aims to develop and validate an interpretable machine learning (ML) prediction model using routine laboratory data for guiding clinical decisions on CT scan use in mTBI patients.

Methods: We conducted a multicentre study in China including data from January 2019 to July 2024. Our study included three patient cohorts: a retrospective training cohort (654 patients for training and 163 for internal testing) and two prospective validation cohorts (86 internal and 290 external patients). Fifty-one routine clinical laboratory characteristics, readily available from the electronic medical record (EMR) system within the first 24 h of admission, were collected. Seven ML algorithms were trained to develop predictive models, with the random forest (RF) algorithm used to optimize key feature combinations. Model predictive performance was evaluated using metrics such as the area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and F1 scores. The SHapley Additive exPlanation (SHAP) was applied to interpret the final model, while decision curve analysis (DCA) was used to assess the clinical net benefit.

Findings: In the derivation cohort, 599 (73.3%) patients had normal CT scans and 218 (26.7%) had abnormal CT scans. The Gradient boosting classifier (GBC) model performed best among the seven ML models, with an AUC of 0.932 (95% CI: 0.900-0.963). After reducing features to 21 (8 biochemical test indicators, 3 coagulation markers, and 10 complete blood cell count indicators) according to feature importance rank, an explainable GBC-final model was established. The final model accurately predicted mTBI patients with abnormal CT in both internal (AUC 0.926, 95% CI: 0.893-0.958) and external (AUC 0.904, 95% CI: 0.835-0.973) validation cohorts. In the prospective cohort, final GBC model achieved AUC of 0.885 (95% CI: 0.753-1.000) and was significantly superior to traditional TBI biomarkers GFAP (AUC: 0.745) and PGP9.5 (AUC: 0.794). DCA revealed that the final model offered greater net benefits than "full intervention" or "no intervention" strategies within a probability threshold range of 0.16-0.93. SHAP analysis identified D-dimer levels, absolute lymphocyte and neutrophil counts, and hematocrit as key high-risk features.

Interpretation: Our optimal feature selection-based ML model accurately and reliably predicts CT abnormalities in mTBI patients using routine test data. By addressing clinicians' concerns regarding transparency and decision-making through SHAP and DCA analyses, we strengthen the potential clinical applicability of our ML model.

Funding: The Natural Science Foundation of Hubei Province, high-level Talent Research Startup Funding of Hubei University of Chinese Medicine, Wuhan Health and Family Planning Scientific Research Fund Project of Hubei Province, and Machine Learning-based Intelligent Diagnosis System for AFP-negative Liver Cancer Project.

Keywords: CT abnormal; DCA; Machine learning; Mild traumatic brain injury; Prediction model; SHAP.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The overall framework of the study. (A) Importing dataset. (B) Data preprocessing. (C) Model development and feature optimization. (D) Model evaluation and visualization. TCM: traditional chinese medicine; ALB: albumin; ALP: alkaline phosphatase; APTT: activated partial thromboplastin time; PT: prothrombin time; BAS#: basophil count; EOS#: eosinophil count; GBC: gradient boosting classifier; LightGBM: light gradient boosting machine; MLP: multi-layer perceptron; SVM: support vector machine; XGBoost: extreme gradient boosting; RF: random forest; GFAP: glial fibrillary acidic protein; PGP9.5: protein gene product 9.5.
Fig. 2
Fig. 2
Flow chart of the study design. mTBI: mild traumatic brain injury; ML: machine learning; GBC: gradient boosting classifier; SHAP: SHapley Additive exPlanation; DCA: decision curve analysis; TBI: traumatic brain injury; GFAP: glial fibrillary acidic protein; PGP9.5: protein gene product 9.5.
Fig. 3
Fig. 3
Evaluation of the seven ML algorithms based on the ROC curve. (A) ROC curves of the diagnostic models generated by seven ML algorithms. (B) AUC, sensitivity, specificity, accuracy, and F1 score of GBC model with different feature combinations. (C) ROC curves of GBC model with different feature combinations. ML: machine learning; ROC: receiver operating characteristic; AUC: the area under the receiver operating characteristic curve; GBC: gradient boosting classifier; Lightgbm: light gradient boosting machine; MLP: Multi-Layer Perceptron; SVM: support vector machine; XGBoost: eXtreme gradient boosting; RF: random forest.
Fig. 4
Fig. 4
Comparison and validation of the final ML model based on the ROC curve in prospective cohorts. (A) Performance of the final ML model based on the ROC curve in the prospective validation cohort at Hubei Provincial Hospital of TCM. (B) Comparison of the final GBC model and two TBI serum biomarkers based on the ROC curve in the prospective validation cohort at Wuhan Yangtze River Shipping General Hospital. (C) Performance of the binary logistic regression model based on 21 features in the prospective validation cohort at Wuhan Yangtze River Shipping General Hospital. ML: machine learning; ROC: receiver operating characteristic; TCM: Traditional Chinese Medicine; GBC: gradient boosting classifier; TBI: traumatic brain injury; AUC: area under the receiver operating characteristic curve; GFAP: glial fibrillary acidic protein; PGP9.5: protein gene product 9.5.
Fig. 5
Fig. 5
DCA of the final GBC model. The solid black line displays the net benefit of the strategy of treating all patients, and the black dotted line illustrates the net benefit of the strategy of treating no patients. DCA: decision curve analysis; GBC: gradient boosting classifier.
Fig. 6
Fig. 6
Global explanation of the model by SHAP method. (A) Feature importance matrix plot. (B) SHAP summary plot. Each line represents a feature, and each data point represents a sample. High feature values are depicted in a red, low feature values in a blue. SHAP: SHapley Additive exPlanation; D-D: D-dimer; LYM#: lymphocyte count; NEU#: neutrophils count; HCT: hematocrit; PT-INR: prothrombin time international normalized ratio; GLU: glucose; TBA: total bile acids; PCT: plateletcrit; CO2: carbon dioxide; Mg: magnesium; BUN: blood urea nitrogen; ALT: alanine transaminase; HGB: hemoglobin; MCV: mean corpuscular volume; PDW: platelet distribution width; WBC: white blood cell; CL: chlorine; RBC: red blood cell; APTT: activated partial thromboplastin time; K: potassium; MPV: mean platelet volume.
Fig. 7
Fig. 7
Local explanation of the model by SHAP method. (A, B) SHAP values of two typical patients from the positive group (A) and the negative group (B) are illustrated with their most important variables. (C) SHAP values for all 817 patients in the training set. SHAP: SHapley Additive exPlanation; D-D: D-dimer; LYM#: lymphocyte count; NEU#: neutrophils count; HCT: hematocrit; PT-INR: prothrombin time international normalized ratio; GLU: glucose; TBA: total bile acids; PCT: plateletcrit; CO2: carbon dioxide; Mg: magnesium; BUN: blood urea nitrogen; ALT: alanine transaminase; HGB: hemoglobin; MCV: mean corpuscular volume; PDW: platelet distribution width; WBC: white blood cell; CL: chlorine; RBC: red blood cell; APTT: activated partial thromboplastin time; K: potassium; MPV: mean platelet volume.

Similar articles

Cited by

References

    1. GBD 2016 Traumatic Brain Injury and Spinal Cord Injury Collaborators Global, regional, and national burden of traumatic brain injury and spinal cord injury, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019;18(1):56–87. - PMC - PubMed
    1. Jiang J.Y., Gao G.Y., Feng J.F., et al. Traumatic brain injury in China. Lancet Neurol. 2019;18(3):286–295. - PubMed
    1. Maas A., Menon D.K., Manley G.T., et al. Traumatic brain injury: progress and challenges in prevention, clinical care, and research. Lancet Neurol. 2022;21:1004–1060. - PMC - PubMed
    1. Terabe M.L., Massago M., Iora P.H., et al. Applicability of machine learning technique in the screening of patients with mild traumatic brain injury. PLoS One. 2023;18 - PMC - PubMed
    1. Papa L., Ladde J.G., O'Brien J.F., et al. Evaluation of glial and neuronal blood biomarkers compared with clinical decision rules in assessing the need for computed tomography in patients with mild traumatic brain injury. JAMA Netw Open. 2022;5(3) - PMC - PubMed

LinkOut - more resources