Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 13:12:1529993.
doi: 10.3389/fmed.2025.1529993. eCollection 2025.

Explainable machine learning model and nomogram for predicting the efficacy of Traditional Chinese Medicine in treating Long COVID: a retrospective study

Affiliations

Explainable machine learning model and nomogram for predicting the efficacy of Traditional Chinese Medicine in treating Long COVID: a retrospective study

Jisheng Zhang et al. Front Med (Lausanne). .

Abstract

Introduction: Long COVID significantly affects patients' quality of life, yet no standardized treatment has been established. Traditional Chinese Medicine (TCM) presents a promising potential approach with targeted therapeutic strategies. This study aims to develop an explainable machine learning (ML) model and nomogram to identify Long COVID patients who may benefit from TCM, enhancing clinical decision-making.

Methods: We analyzed data from 1,331 Long COVID patients treated with TCM between December 2022 and February 2024 at three hospitals in Zhejiang, China. Effectiveness was defined as improvement in two or more symptoms or a minimum 2-point increase in the Traditional Chinese Medicine Syndrome Score (TCMSS). Data included 11 patient and disease characteristics, 18 clinical symptoms and syndrome scores, and 12 auxiliary examination indicators. The least absolute shrinkage and selection operator (LASSO) method identified features linked to TCM efficacy. Data from 1,204 patients served as the training set, while 127 patients formed the testing set.

Results: We employed five ML algorithms: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), and Neural Network (NN). The XGBoost model achieved an Area Under the Curve (AUC) of 0.9957 and an F1 score of 0.9852 in the training set, demonstrating superior performance in the testing set with an AUC of 0.9059 and F1 score of 0.9027. Key features identified through SHapley Additive exPlanations (SHAP) included chest tightness, aversion to cold, age, TCMSS, Short Form (36) Health Survey (SF-36), C-reactive protein (CRP), and lymphocyte ratio. The logistic regression-based nomogram demonstrated an AUC of 0.9479 and F1 score of 0.9384 in the testing set.

Conclusion: This study utilized multicenter data and multiple ML algorithms to create a ML model for predicting TCM efficacy in Long COVID treatment. Furthermore, a logistic regression-based nomogram was developed to assist the model and improve decision-making efficiency in TCM applications for Long COVID management.

Keywords: Long COVID; SHapley Additive exPlanations; Traditional Chinese Medicine; efficacy; machine learning; nomogram.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Flowchart of this study. TCM, Traditional Chinese Medicine; LASSO, Least Absolute Shrinkage and Selection Operator; SVM, Support Vector Machine; RF, Random Forest; KNN, K-Nearest Neighbors; XGBoost, Extreme Gradient Boosting; NN, Neural Network; AUC, Area Under the Curve; SHAP, SHapley Additive exPlanations; ML, machine learning.
Figure 2
Figure 2
The results of the ROC curve, AUC value, and correlation analysis. (A) Receiver Operating Characteristic (ROC) curve plot; (B) Area Under the Curve (AUC) bar chart; (C) Correlation matrix heatmap. BMI, body mass index; TCMSS, Traditional Chinese Medicine syndrome score; SF-36, Short Form (36) Health Survey; PSQI, Pittsburgh Sleep Quality Index; CRP, C-reactive protein; WBC, White blood cells; RBC, Red blood cells; AST/ALT, the ratio of Aspartate Aminotransferase to Alanine Aminotransferase. *p < 0.05, **p < 0.01, ***p < 0.001.
Figure 3
Figure 3
The results of Lasso regression. (A) LASSO Coefficient Path Plot; (B) Cross-Validation Error Plot for LASSO.
Figure 4
Figure 4
Combined ROC Curves for Multiple Machine Learning Models. (A) ROC curves for training sets; (B) ROC curves for testing sets. KNN, K-Nearest Neighbors; NN, Neural Network; RF, Random Forest; SVM, Support Vector Machine; XGBost, Extreme Gradient Boosting.
Figure 5
Figure 5
SHAP value of each feature in the model. (A) SHAP feature importance shown according to the mean absolute SHAP value of each feature; (B) SHAP summary plot showing the distribution of the SHAP values of each feature. SF-36, Short Form (36) Health Survey; TCMSS, Traditional Chinese Medicine syndrome score; CRP, C-reactive protein.
Figure 6
Figure 6
SHAP dependence plots of continuous features in the model. (A) Short Form (36) Health Survey (SF-36), (B) Traditional Chinese Medicine (TCM) syndrome score, (C) C-reactive protein (CRP), (D) age and (E) Lymphocyte ratio. The y-axis represents the SHAP values of features, and the values of certain features are shown in the x-axis, continuous variables were standardized using the min–max scaling method, resulting in values between 0 and 1. Each dot represents a SHAP value for a feature per patient, and color from light to dark represents the feature's value from high to low. SHAP values for specific features exceeding zero represent an increased probability of Traditional Chinese Medicine being effective in treating long COVID. SF-36, Short Form (36) Health Survey; TCMSS, Traditional Chinese Medicine syndrome score; CRP, C-reactive protein.
Figure 7
Figure 7
Patient-level SHAP force plots. (A) True positive patient, (B) True negative patient. The color represents the contributions of each feature, with red being positive and blue being negative. The length of the color bar represents the contribution strength.
Figure 8
Figure 8
Nomogram for logistic regression. SF-36, Short Form (36) Health Survey; TCMSS, Traditional Chinese Medicine syndrome score; CRP, C-reactive protein.

Similar articles

References

    1. Soriano JB, Murthy S, Marshall JC, Relan P, Diaz JV. WHO Clinical Case Definition Working Group on Post-COVID-19 Condition. A clinical case definition of post-COVID-19 condition by a Delphi consensus. Lancet Infect Dis. (2022) 22:e102–7. 10.1016/S1473-3099(21)00703-9 - DOI - PMC - PubMed
    1. Davis HE, McCorkell L, Vogel JM, Topol EJ. Long COVID: major findings, mechanisms and recommendations. Nat Rev Microbiol. (2023) 21:133–46. 10.1038/s41579-022-00846-2 - DOI - PMC - PubMed
    1. Ford ND. Long COVID and significant activity limitation among adults, by age—United States, June 1–13, 2022, to June 7–19, 2023. MMWR Morb Mortal Wkly Rep. (2023) 72:866–70. 10.15585/mmwr.mm7232a3 - DOI - PMC - PubMed
    1. Klein J, Wood J, Jaycox JR, Dhodapkar RM, Lu P, Gehlhausen JR, et al. . Distinguishing features of long COVID identified through immune profiling. Nature. (2023) 623:139–48. 10.1038/s41586-023-06651-y - DOI - PMC - PubMed
    1. Greenhalgh T, Sivan M, Perlowski A, Nikolich JŽ. Long COVID: a clinical update. Lancet. (2024) 404:707–24. 10.1016/S0140-6736(24)01136-X - DOI - PubMed

LinkOut - more resources