Development of risk models of incident hypertension using machine learning on the HUNT study data

Filip Emil Schjerven¹, Emma Maria Lovisa Ingeström², Ingelin Steinsland³, Frank Lindseth⁴

Affiliations

¹ Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway. filip.e.schjerven@ntnu.no.
² Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway.
³ Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway.
⁴ Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.

PMID: 38454041
PMCID: PMC10920790
DOI: 10.1038/s41598-024-56170-7

Development of risk models of incident hypertension using machine learning on the HUNT study data

Filip Emil Schjerven et al. Sci Rep. 2024.

. 2024 Mar 7;14(1):5609.

doi: 10.1038/s41598-024-56170-7.

Authors

Filip Emil Schjerven¹, Emma Maria Lovisa Ingeström², Ingelin Steinsland³, Frank Lindseth⁴

Affiliations

¹ Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway. filip.e.schjerven@ntnu.no.
² Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway.
³ Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway.
⁴ Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.

PMID: 38454041
PMCID: PMC10920790
DOI: 10.1038/s41598-024-56170-7

Abstract

In this study, we aimed to create an 11-year hypertension risk prediction model using data from the Trøndelag Health (HUNT) Study in Norway, involving 17 852 individuals (20-85 years; 38% male; 24% incidence rate) with blood pressure (BP) below the hypertension threshold at baseline (1995-1997). We assessed 18 clinical, behavioral, and socioeconomic features, employing machine learning models such as eXtreme Gradient Boosting (XGBoost), Elastic regression, K-Nearest Neighbor, Support Vector Machines (SVM) and Random Forest. For comparison, we used logistic regression and a decision rule as reference models and validated six external models, with focus on the Framingham risk model. The top-performing models consistently included XGBoost, Elastic regression and SVM. These models efficiently identified hypertension risk, even among individuals with optimal baseline BP (< 120/80 mmHg), although improvement over reference models was modest. The recalibrated Framingham risk model outperformed the reference models, approaching the best-performing ML models. Important features included age, systolic and diastolic BP, body mass index, height, and family history of hypertension. In conclusion, our study demonstrated that linear effects sufficed for a well-performing model. The best models efficiently predicted hypertension risk, even among those with optimal or normal baseline BP, using few features. The recalibrated Framingham risk model proved effective in our cohort.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Smoothed calibration curves for the test set. Calibration curves close to the dashed reference line exhibit an elevated level of agreement between its predictions and the observed incidence in the test set. Curves are shown as pointwise mean curves calculated by bootstrapping. *KNN* K-nearest neighbors, *SVM* support vector machines, *XGBoost* eXtreme gradient boosting.

**Figure 2**
Calibration curves with histogram of predictions above. The histogram is colored by proportion of incidence. Curves are shown as pointwise mean curves with red shaded 95% confidence interval calculated by bootstrapping. *KNN* K-nearest neighbors, *SVM* support vector machines, *XGBoost* eXtreme gradient boosting.

**Figure 3**
Decision curves of all models. Net benefit was standardized to have a max value of 1. Curves are shown as pointwise mean curves calculated by bootstrapping. BP blood pressure, *KNN* K-nearest neighbors, *SVM* support vector machines, *XGBoost* eXtreme gradient boosting.

**Figure 4**
Decision curves with histogram of predictions above. The histogram is colored by the proportion of incidence. Net Benefit is standardized to have a max value of 1. Curves are shown as pointwise mean curves with red shaded 95% confidence interval calculated by bootstrapping. *KNN* K-nearest neighbors, *SVM* support vector machines, *XGBoost* eXtreme gradient boosting.

**Figure 5**
(a) Coefficient sizes in least absolute shrinkage and selection operator (LASSO) regression fitted on the training set with increasing regularization. Only the 10 last features to be zeroed out are shown. (b) The performance of the LASSO regression model on the test set as regularization was increased. Curves are shown as pointwise mean curves with red shaded 95% confidence interval calculated by bootstrapping. *AUC* area under the receiver-operator curve, *BMI* body mass index, BP blood pressure, *Chol* cholesterol, *Fam. hist. of hyp.* family history of hypertension, *HDL* high-density lipid, *ICI* integrated calibration index. *Log* natural logarithm, *PAI* physical activity indicator.

**Figure 6**
Permutation importance calculated for XGBoost, SVM, KNN and random forest models. The importance of a feature or cluster was determined as the average decrease in Scaled Brier score on the test set when the feature or cluster was permuted. Features are colored following Fig. 5—Panel A, with gray for ‘Sex’ and ‘Marital status’, and combined colors for clusters. Irrelevant features or clusters, defined as those with a mean decrease of less than 0.004 in Scaled Brier score, were left out for conciseness. Features in clusters were permuted simultaneously. *BMI* body mass index, BP blood pressure, Cl. # feature cluster #, *HDL* high-density lipid, *KNN* K-nearest neighbors, *SVM* support vector machines, *XGBoost* eXtreme gradient boosting.

See this image and copyright information in PMC

References

1. Williams B, et al. 2018 ESC/ESH Guidelines for the management of arterial hypertension. Eur. Heart J. 2018;39:3021–3104. doi: 10.1093/eurheartj/ehy339. - DOI - PubMed
1. Zhou B, Perel P, Mensah GA, Ezzati M. Global epidemiology, health burden and effective interventions for elevated blood pressure and hypertension. Nat. Rev. Cardiol. 2021;18:785–802. doi: 10.1038/s41569-021-00559-8. - DOI - PMC - PubMed
1. Gaziano TA, Bitton A, Anand S, Weinstein MC. The global cost of nonoptimal blood pressure. J. Hypertens. 2009;27:1472–1477. doi: 10.1097/HJH.0b013e32832a9ba3. - DOI - PubMed
1. Echouffo-Tcheugui JB, Batty GD, Kivimäki M, Kengne AP. Risk models to predict hypertension: A systematic review. PLoS ONE. 2013;8:e67370. doi: 10.1371/journal.pone.0067370. - DOI - PMC - PubMed
1. Sun D, et al. Recent development of risk-prediction models for incident hypertension: An updated systematic review. PLoS ONE. 2017;12:e0187240. doi: 10.1371/journal.pone.0187240. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development of risk models of incident hypertension using machine learning on the HUNT study data

Affiliations

Development of risk models of incident hypertension using machine learning on the HUNT study data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical