Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 24;300(1):18.
doi: 10.1007/s00438-024-02217-3.

A comprehensive analysis of stroke risk factors and development of a predictive model using machine learning approaches

Affiliations

A comprehensive analysis of stroke risk factors and development of a predictive model using machine learning approaches

Songquan Xie et al. Mol Genet Genomics. .

Abstract

Stroke is a leading cause of death and disability globally, particularly in China. Identifying risk factors for stroke at an early stage is critical to improving patient outcomes and reducing the overall disease burden. However, the complexity of stroke risk factors requires advanced approaches for accurate prediction. The objective of this study is to identify key risk factors for stroke and develop a predictive model using machine learning techniques to enhance early detection and improve clinical decision-making. Data from the China Health and Retirement Longitudinal Study (2011-2020) were analyzed, classifying participants based on baseline characteristics. We evaluated correlations among 12 chronic diseases and applied machine learning algorithms to identify stroke-associated parameters. A dose-response relationship between these parameters and stroke was assessed using restricted cubic splines with Cox proportional hazards models. A refined predictive model, incorporating age, sex, and key risk factors, was developed. Stroke patients were significantly older (average age 69.03 years) and had a higher proportion of women (53%) compared to non-stroke individuals. Additionally, stroke patients were more likely to reside in rural areas, be unmarried, smoke, and suffer from various diseases. While the 12 chronic diseases were correlated (p < 0.05), the correlation coefficients were generally weak (r < 0.5). Machine learning identified nine parameters significantly associated with stroke risk: TyG-WC, WHtR, TyG-BMI, TyG, TMO, CysC, CREA, SBP, and HDL-C. Of these, TyG-WC, WHtR, TyG-BMI, TyG, CysC, CREA, and SBP exhibited a positive dose-response relationship with stroke risk. In contrast, TMO and HDL-C were associated with reduced stroke risk. In the fully adjusted model, elevated CysC (HR = 2.606, 95% CI 1.869-3.635), CREA (HR = 1.819, 95% CI 1.240-2.668), and SBP (HR = 1.008, 95% CI 1.003-1.012) were significantly associated with increased stroke risk, while higher HDL-C (HR = 0.989, 95% CI 0.984-0.995) and TMO (HR = 0.99995, 95% CI 0.99994-0.99997) were protective. A nomogram model incorporating age, sex, and the identified parameters demonstrated superior predictive accuracy, with a significantly higher Harrell's C-index compared to individual predictors. This study identifies several significant stroke risk factors and presents a predictive model that can enhance early detection of high-risk individuals. Among them, CREA, CysC, SBP, TyG-BMI, TyG, TyG-WC, and WHtR were positively associated with stroke risk, whereas TMO and HDL-C were opposite. This serves as a valuable decision-support resource for clinicians, facilitating more effective prevention and treatment strategies, ultimately improving patient outcomes.

Keywords: Dose–response relationship; Feature parameters; Machine learning; Nomogram predictive modelling; Risk factors; Stroke.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: The authors declare no conflict of interest. Informed consent: Not applicable. Institutional review board: Not applicable.

Figures

Fig. 1
Fig. 1
Correlation matrix of chronic diseases: phi coefficients and statistical significance
Fig. 2
Fig. 2
Feature importance analysis across multiple machine learning algorithms. A Bar plots showing the top 10 features and their importance scores for each of the eight machine learning algorithms. B Scatter plot of feature importance consensus. Each point represents a feature, with its average importance score on the x-axis and the number of times it appeared in the top 10 features across all algorithms on the y-axis. The top five features by occurrence count and the top five by average importance score are labeled.
Fig. 3
Fig. 3
Dose–response relationships between key predictors and stroke risk
Fig. 4
Fig. 4
Development and validation of a stroke risk prediction model. A Nomogram for predicting 1-, 3-, and 5-year stroke risk. Points are assigned for each risk factor by drawing a line upward from the corresponding value to the 'Points' line. The sum of these points plotted on the 'Total Points' line corresponds to the predicted 1-, 3-, and 5-year stroke risk. B Calibration curves for 1-, 3-, and 5-year stroke risk predictions. The x-axis represents the predicted probability, and the y-axis represents the actual observed probability. Perfect predictions should fall on the diagonal line. Closer alignment of the red line to the diagonal indicates better calibration. C Time-dependent AUC curves for individual predictors and the nomogram score over 8 years of follow-up. Higher AUC values indicate better predictive performance. The nomogram score represents the combined predictive power of all included variables.

Similar articles

References

    1. Aarts E (2010) A novel method to obtain the treatment effect assessed for a completely randomized design: multiple imputation of unobserved potential outcomes. Utrecht University
    1. Al Rifai M, Blaha MJ, Ahmed A et al (2020) Cardiorespiratory fitness and incident stroke types: the FIT (Henry Ford ExercIse Testing) Project. Mayo Clin Proc 95(7):1379–1389 - PubMed
    1. Alloubani A, Nimer R, Samara R (2021) Relationship between hyperlipidemia, cardiovascular disease and stroke: a systematic review. Curr Cardiol Rev 17(6):e418418537 - PMC - PubMed
    1. An H, Zhou B, Ji X (2021) Mitochondrial quality control in acute ischemic stroke. J Cereb Blood Flow Metab off J Int Soc Cereb Blood Flow Metab 41(12):3157–3170 - PMC - PubMed
    1. Barthels D, Das H (2020) Current advances in ischemic stroke research and therapies. Biochimica Et Biophysica Acta Mol Basis Dis 1866(4):165260 - PMC - PubMed

LinkOut - more resources