Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 19;12(1):21905.
doi: 10.1038/s41598-022-25933-5.

Explainable machine learning framework for predicting long-term cardiovascular disease risk among adolescents

Affiliations

Explainable machine learning framework for predicting long-term cardiovascular disease risk among adolescents

Haya Salah et al. Sci Rep. .

Abstract

Although cardiovascular disease (CVD) is the leading cause of death worldwide, over 80% of it is preventable through early intervention and lifestyle changes. Most cases of CVD are detected in adulthood, but the risk factors leading to CVD begin at a younger age. This research is the first to develop an explainable machine learning (ML)-based framework for long-term CVD risk prediction (low vs. high) among adolescents. This study uses longitudinal data from a nationally representative sample of individuals who participated in the Add Health study. A total of 14,083 participants who completed relevant survey questionnaires and health tests from adolescence to young adulthood were chosen. Four ML classifiers [decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), and deep neural networks (DNN)] and 36 adolescent predictors are used to predict adulthood CVD risk. While all ML models demonstrated good prediction capability, XGBoost achieved the best performance (AUC-ROC: 84.5% and AUC-PR: 96.9% on testing data). Besides, critical predictors of long-term CVD risk and its impact on risk prediction are obtained using an explainable technique for interpreting ML predictions. The results suggest that ML can be employed to detect adulthood CVD very early in life, and such an approach may facilitate primordial prevention and personalized intervention.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Overview of explainable machine learning framework for CVD risk prediction.
Figure 2
Figure 2
Performance of ML models during tenfold cross-validation procedure.
Figure 3
Figure 3
Performance of the ML models on the testing dataset. AUC-ROC curve is maximized in the upper left corner, and AUC-PR curve is maximized in the upper right corner.
Figure 4
Figure 4
Permutation feature importance plot of the ML models. Higher value corresponds to a more important feature in predicting CVD risk. The plot is created for (a) XGBoost, (b) RF, (c) DNN, (d) DT.
Figure 5
Figure 5
Global interpretation of ML models. The x-axis is the average (absolute) SHAP value for each adolescent risk factor. Higher value corresponds to a more important feature in predicting CVD risk. The plot is created for (a) XGBoost, (b) RF, (c) DNN, (d) DT.
Figure 6
Figure 6
Global interpretation of ML models—SHAP summary plots of the input features. Features were sorted in descending order by SHAP values. SHAP values for each feature were calculated, which is represented by a single dot. Dots were colored based on the underlying feature’s value. For the features of gender_female, the red dots indicated female and the blue dots indicated male. The summary plot is created for each ML model: (a) XGBoost, (b) RF, (c) DNN, (d) DT.
Figure 7
Figure 7
Partial dependence plots: (a) adolescent BMI, (b) cigarettes smoked per month, (c) hours of sedentary duration, (d) breakfast frequency. SHAP values greter than zero indicates a positive correlation between the two adolescent risk factors.
Figure 8
Figure 8
Local interpretation—force plots for two individuals from the testing set of the XGBoost model: (a) high risk individual, (b) low risk individual.

Similar articles

Cited by

References

    1. Cardiovascular diseases (CVDs) Fact sheet. World Health Organizationhttps://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-disea...) (2021).
    1. Benjamin EJ, et al. Heart disease and stroke statistics-2019 update: A report from the American heart association. Circulation. 2019;139(10):e56–e528. doi: 10.1161/CIR.0000000000000659. - DOI - PubMed
    1. Virani SS, et al. Heart disease and stroke statistics-2021 update a report from the American heart association. Circulation. 2021;143:E254–E743. doi: 10.1161/CIR.0000000000000950. - DOI - PubMed
    1. Berenson GS, et al. Atherosclerosis of the aorta and coronary arteries and cardiovascular risk factors in persons aged 6 to 30 years and studied at necropsy (the Bogalusa Heart Study) Am. J. Cardiol. 1992;70:851–858. doi: 10.1016/0002-9149(92)90726-F. - DOI - PubMed
    1. Berenson GS, et al. Association between multiple cardiovascular risk factors and atherosclerosis in children and young adults. N. Engl. J. Med. 1998;338:1650–1656. doi: 10.1056/NEJM199806043382302. - DOI - PubMed

Publication types