A comprehensive analysis of stroke risk factors and development of a predictive model using machine learning approaches
- PMID: 39853452
- PMCID: PMC11762205
- DOI: 10.1007/s00438-024-02217-3
A comprehensive analysis of stroke risk factors and development of a predictive model using machine learning approaches
Abstract
Stroke is a leading cause of death and disability globally, particularly in China. Identifying risk factors for stroke at an early stage is critical to improving patient outcomes and reducing the overall disease burden. However, the complexity of stroke risk factors requires advanced approaches for accurate prediction. The objective of this study is to identify key risk factors for stroke and develop a predictive model using machine learning techniques to enhance early detection and improve clinical decision-making. Data from the China Health and Retirement Longitudinal Study (2011-2020) were analyzed, classifying participants based on baseline characteristics. We evaluated correlations among 12 chronic diseases and applied machine learning algorithms to identify stroke-associated parameters. A dose-response relationship between these parameters and stroke was assessed using restricted cubic splines with Cox proportional hazards models. A refined predictive model, incorporating age, sex, and key risk factors, was developed. Stroke patients were significantly older (average age 69.03 years) and had a higher proportion of women (53%) compared to non-stroke individuals. Additionally, stroke patients were more likely to reside in rural areas, be unmarried, smoke, and suffer from various diseases. While the 12 chronic diseases were correlated (p < 0.05), the correlation coefficients were generally weak (r < 0.5). Machine learning identified nine parameters significantly associated with stroke risk: TyG-WC, WHtR, TyG-BMI, TyG, TMO, CysC, CREA, SBP, and HDL-C. Of these, TyG-WC, WHtR, TyG-BMI, TyG, CysC, CREA, and SBP exhibited a positive dose-response relationship with stroke risk. In contrast, TMO and HDL-C were associated with reduced stroke risk. In the fully adjusted model, elevated CysC (HR = 2.606, 95% CI 1.869-3.635), CREA (HR = 1.819, 95% CI 1.240-2.668), and SBP (HR = 1.008, 95% CI 1.003-1.012) were significantly associated with increased stroke risk, while higher HDL-C (HR = 0.989, 95% CI 0.984-0.995) and TMO (HR = 0.99995, 95% CI 0.99994-0.99997) were protective. A nomogram model incorporating age, sex, and the identified parameters demonstrated superior predictive accuracy, with a significantly higher Harrell's C-index compared to individual predictors. This study identifies several significant stroke risk factors and presents a predictive model that can enhance early detection of high-risk individuals. Among them, CREA, CysC, SBP, TyG-BMI, TyG, TyG-WC, and WHtR were positively associated with stroke risk, whereas TMO and HDL-C were opposite. This serves as a valuable decision-support resource for clinicians, facilitating more effective prevention and treatment strategies, ultimately improving patient outcomes.
Keywords: Dose–response relationship; Feature parameters; Machine learning; Nomogram predictive modelling; Risk factors; Stroke.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Conflict of interest: The authors declare no conflict of interest. Informed consent: Not applicable. Institutional review board: Not applicable.
Figures




Similar articles
-
The association between triglyceride-glucose index combined with obesity indicators and stroke risk: A longitudinal study based on CHARLS data.BMC Endocr Disord. 2024 Nov 1;24(1):234. doi: 10.1186/s12902-024-01729-8. BMC Endocr Disord. 2024. PMID: 39487484 Free PMC article.
-
Association between triglyceride-glucose (TyG) related indices and cardiovascular diseases and mortality among individuals with metabolic dysfunction-associated steatotic liver disease: a cohort study of UK Biobank.Cardiovasc Diabetol. 2025 Jan 13;24(1):12. doi: 10.1186/s12933-024-02572-w. Cardiovasc Diabetol. 2025. PMID: 39806394 Free PMC article.
-
Association between the triglyceride-glucose index and its combined obesity indicators and the risk of hypertension in middle-aged and older Chinese adults: A nationwide cross-sectional study.PLoS One. 2025 Jan 2;20(1):e0316581. doi: 10.1371/journal.pone.0316581. eCollection 2025. PLoS One. 2025. PMID: 39746074 Free PMC article.
-
Exploring the prognostic impact of triglyceride-glucose index in critically ill patients with first-ever stroke: insights from traditional methods and machine learning-based mortality prediction.Cardiovasc Diabetol. 2024 Dec 18;23(1):443. doi: 10.1186/s12933-024-02538-y. Cardiovasc Diabetol. 2024. PMID: 39695656 Free PMC article.
-
Evaluating a new obesity indicator for stroke risk prediction: comparative cohort analysis in rural settings of two nations.BMC Public Health. 2024 Nov 27;24(1):3301. doi: 10.1186/s12889-024-20631-5. BMC Public Health. 2024. PMID: 39605023 Free PMC article.
References
-
- Aarts E (2010) A novel method to obtain the treatment effect assessed for a completely randomized design: multiple imputation of unobserved potential outcomes. Utrecht University
-
- Al Rifai M, Blaha MJ, Ahmed A et al (2020) Cardiorespiratory fitness and incident stroke types: the FIT (Henry Ford ExercIse Testing) Project. Mayo Clin Proc 95(7):1379–1389 - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical