Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 28;29(24):3855-3870.
doi: 10.3748/wjg.v29.i24.3855.

Comparison and development of machine learning for thalidomide-induced peripheral neuropathy prediction of refractory Crohn's disease in Chinese population

Affiliations

Comparison and development of machine learning for thalidomide-induced peripheral neuropathy prediction of refractory Crohn's disease in Chinese population

Jing Mao et al. World J Gastroenterol. .

Abstract

Background: Thalidomide is an effective treatment for refractory Crohn's disease (CD). However, thalidomide-induced peripheral neuropathy (TiPN), which has a large individual variation, is a major cause of treatment failure. TiPN is rarely predictable and recognized, especially in CD. It is necessary to develop a risk model to predict TiPN occurrence.

Aim: To develop and compare a predictive model of TiPN using machine learning based on comprehensive clinical and genetic variables.

Methods: A retrospective cohort of 164 CD patients from January 2016 to June 2022 was used to establish the model. The National Cancer Institute Common Toxicity Criteria Sensory Scale (version 4.0) was used to assess TiPN. With 18 clinical features and 150 genetic variables, five predictive models were established and evaluated by the confusion matrix receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), specificity, sensitivity (recall rate), precision, accuracy, and F1 score.

Results: The top-ranking five risk variables associated with TiPN were interleukin-12 rs1353248 [P = 0.0004, odds ratio (OR): 8.983, 95% confidence interval (CI): 2.497-30.90], dose (mg/d, P = 0.002), brain-derived neurotrophic factor (BDNF) rs2030324 (P = 0.001, OR: 3.164, 95%CI: 1.561-6.434), BDNF rs6265 (P = 0.001, OR: 3.150, 95%CI: 1.546-6.073) and BDNF rs11030104 (P = 0.001, OR: 3.091, 95%CI: 1.525-5.960). In the training set, gradient boosting decision tree (GBDT), extremely random trees (ET), random forest, logistic regression and extreme gradient boosting (XGBoost) obtained AUROC values > 0.90 and AUPRC > 0.87. Among these models, XGBoost and GBDT obtained the first two highest AUROC (0.90 and 1), AUPRC (0.98 and 1), accuracy (0.96 and 0.98), precision (0.90 and 0.95), F1 score (0.95 and 0.98), specificity (0.94 and 0.97), and sensitivity (1). In the validation set, XGBoost algorithm exhibited the best predictive performance with the highest specificity (0.857), accuracy (0.818), AUPRC (0.86) and AUROC (0.89). ET and GBDT obtained the highest sensitivity (1) and F1 score (0.8). Overall, compared with other state-of-the-art classifiers such as ET, GBDT and RF, XGBoost algorithm not only showed a more stable performance, but also yielded higher ROC-AUC and PRC-AUC scores, demonstrating its high accuracy in prediction of TiPN occurrence.

Conclusion: The powerful XGBoost algorithm accurately predicts TiPN using 18 clinical features and 14 genetic variables. With the ability to identify high-risk patients using single nucleotide polymorphisms, it offers a feasible option for improving thalidomide efficacy in CD patients.

Keywords: Gene polymorphisms; Machine learning; Neurotoxicity prediction models; Refractory Crohn’s disease; Thalidomide-induced peripheral neuropathy.

PubMed Disclaimer

Conflict of interest statement

Conflict-of-interest statement: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Flow chart display. Flow chart showing the process of model generation validation of the model. RF: Random forest; GBDT: Gradient boosting decision tree; ET: Extremely randomized trees; LR: Logistic regression; XGBoost: Extreme gradient boosting.
Figure 2
Figure 2
Information gain values of the features. The higher the information gain value, the more important the variable. Therefore, these five variables (rs1353248, dose, rs6265, rs2030324, rs11030104) are the optimal feature set.
Figure 3
Figure 3
The correlation between the optimal variables and thalidomide-induced peripheral neuropathy. Illustrate patients with interleukin-12 rs1353248_TT, brain-derived neurotrophic factor (BDNF) rs2030324_AG, BDNF rs6265_CT, and BDNF rs11030104_AG, genotypes are more likely to have thalidomide-induced peripheral neuropathy than non-carriers. A: IL-12; B-D: BDNF. aP < 0.01, bP < 0.001. BDNF: Brain-derived neurotrophic factor; IL: Interleukin.
Figure 4
Figure 4
Examination of consequences between the top four single nucleotide polymorphisms and gene expression in nerve tibial tissue. Patients with interleukin (IL)-12 rs1353248_TT (chr3_159905770, P = 8.52 × 10-4), brain-derived neurotrophic factor (BDNF) rs6265_CT (chr11_27658369, P = 1.07 × 10-4) BDNF rs2030324_AG (chr11_27705368, P = 9.2 × 10-11), and BDNF rs11030104_AG (chr11_27662970, P = 2.76 × 10-5). A: IL-12; B-D: BDNF. The expression levels of the BDNF gene were reduced in rs6265CT and rs11030104AG. Additionally, the expression levels of the IL-12 gene were significantly decreased in the rs1353248TT. BDNF: Brain-derived neurotrophic factor; IL: Interleukin.
Figure 5
Figure 5
Evaluation of the predictive models. Average area under the receiver operating characteristic curve and precision recall curve of the five models in the training set. A: Receiver operating characteristic curve (training set); B: Precision-recall curve (training set). Average area and 95% confidence intervals of different predictive models are displayed in the box. XGBoost: Extreme gradient boosting; ET: Extremely random trees; GBDT: Gradient boosting decision tree; LR: Logistic regression; RF, random forest; CI: Confidence interval.
Figure 6
Figure 6
Validation of the training set. The picture shows average area under the receiver operating characteristic curve and precision recall curve of the five models in the test set. A: Receiver operating characteristic curve (testing set); B: Precision-recall curve (testing set). AUC: Area under the curve; CI: Confidence interval; XGBoost: Extreme gradient boosting; ET: Extremely random trees; GBDT: Gradient boosting decision tree; LR: Logistic regression; RF: Random forest; CI: Confidence interval.

Similar articles

Cited by

References

    1. Lazzerini M, Martelossi S, Magazzù G, Pellegrino S, Lucanto MC, Barabino A, Calvi A, Arrigo S, Lionetti P, Lorusso M, Mangiantini F, Fontana M, Zuin G, Palla G, Maggiore G, Bramuzzo M, Pellegrin MC, Maschio M, Villanacci V, Manenti S, Decorti G, De Iudicibus S, Paparazzo R, Montico M, Ventura A. Effect of thalidomide on clinical remission in children and adolescents with refractory Crohn disease: a randomized clinical trial. JAMA. 2013;310:2164–2173. - PubMed
    1. Peng X, Lin ZW, Zhang M, Yao JY, Zhao JZ, Hu PJ, Cao Q, Zhi M. The efficacy and safety of thalidomide in the treatment of refractory Crohn's disease in adults: a double-center, double-blind, randomized-controlled trial. Gastroenterol Rep (Oxf) 2022;10:goac052. - PMC - PubMed
    1. Mileshkin L, Stark R, Day B, Seymour JF, Zeldis JB, Prince HM. Development of neuropathy in patients with myeloma treated with thalidomide: patterns of occurrence and the role of electrophysiologic monitoring. J Clin Oncol. 2006;24:4507–4514. - PubMed
    1. Franks ME, Macpherson GR, Figg WD. Thalidomide. Lancet. 2004;363:1802–1811. - PubMed
    1. Selvy M, Kerckhove N, Pereira B, Barreau F, Nguyen D, Busserolles J, Giraudet F, Cabrespine A, Chaleteix C, Soubrier M, Bay JO, Lemal R, Balayssac D. Prevalence of Chemotherapy-Induced Peripheral Neuropathy in Multiple Myeloma Patients and its Impact on Quality of Life: A Single Center Cross-Sectional Study. Front Pharmacol. 2021;12:637593. - PMC - PubMed

MeSH terms

Substances