Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 7:9:e47095.
doi: 10.2196/47095.

Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study

Affiliations

Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study

Yuhan Deng et al. JMIR Public Health Surveill. .

Abstract

Background: Carotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease.

Objective: This study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque.

Methods: Our study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set.

Results: Among the participants, 26.23% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25% and 65% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque.

Conclusions: The combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver.

Keywords: cardiovascular; carotid plaque; fatty liver; health check-up; logistic regression; machine learning; prediction; risk assessment; risk stratification.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Flowchart of the study participants.
Figure 2
Figure 2
Feature importance of the potential predictors on carotid plaque in population with fatty liver disease generated by (A) RF, (B) EN, and (C) XGBoost. The features highlighted in dark color represent those coselected by all 3 algorithms. ALP: alkaline phosphatase; ALT: alanine transaminase; AST: aspartate aminotransferase; Cr: creatinine; DB: diabetes; DBIL: direct bilirubin; DBP: diastolic blood pressure; EN: elastic net; FBG: fasting blood glucose; HDL-C: high-density lipoprotein cholesterol; HLP: Hyperlipidemia; HR: heart rate; HSI: hepatic steatosis index; HT: height; HTN: hypertension; LDL-C: low-density lipoprotein cholesterol; PLT: blood platelet count; RF: random forest; SBP: systolic blood pressure; TBIL: total bilirubin; TC: total cholesterol; TG: triglyceride; UA: uric acid; WBC: white blood cell count; WT: weight; XGBoost: extreme gradient boosting.
Figure 3
Figure 3
Model performance in discrimination and calibration for predicting the risk of carotid plaque in population with fatty liver disease evaluated by (A) ROC curves and (B) calibration curves. AUC: area under the curve; ROC: receiver operating characteristic.
Figure 4
Figure 4
Decision curve analysis for predicting the risk of carotid plaque in population with fatty liver disease in (A) internal validation data set and (B) external validation data set.
Figure 5
Figure 5
Probability distribution and risk classification plot generated by the carotid plaque prediction model in population with fatty liver disease in (A) internal validation data set and (B) external validation data set. The blue and pink colored columns represent the number of participants on different predicted probabilities, and the predicted probabilities are split into low risk, intermediate risk, and high risk by 0.25 and 0.65. Different levels of risks are presented by gray pillars of different opacities, the height of each pillar corresponds to risk proportion, which is calculated by the prevalence rate in each risk level. HR: high risk; IR: intermediate risk; LR: low risk.

References

    1. Liu J, Zhou X, Lin H, Lu X, Zheng J, Xu E, Jiang D, Zhang H, Yang X, Zhong J, Hu X, Huang Y, Zhang Y, Liang J, Liu Q, Zhong M, Chen Y, Yan H, Deng H, Zheng R, Ni D, Ren J. Deep learning based on carotid transverse B-mode scan videos for the diagnosis of carotid plaque: a prospective multicenter study. Eur Radiol. 2023;33(5):3478–3487. doi: 10.1007/s00330-022-09324-y.10.1007/s00330-022-09324-y - DOI - PubMed
    1. Zhang Y, Wu Z, Li X, Wei J, Zhang Q, Wang J. Association between the triglyceride-glucose index and carotid plaque incidence: a longitudinal study. Cardiovasc Diabetol. 2022;21(1):244. doi: 10.1186/s12933-022-01683-6. https://cardiab.biomedcentral.com/articles/10.1186/s12933-022-01683-6 10.1186/s12933-022-01683-6 - DOI - DOI - PMC - PubMed
    1. Xie J, Li Y, Xu X, Wei J, Li H, Wu S, Chen H. CPTV: classification by tracking of carotid plaque in ultrasound videos. Comput Med Imaging Graph. 2023;104:102175. doi: 10.1016/j.compmedimag.2022.102175. https://linkinghub.elsevier.com/retrieve/pii/S0895-6111(22)00145-8 S0895-6111(22)00145-8 - DOI - PubMed
    1. Ding X, Wang X, Wu J, Zhang M, Cui M. Triglyceride-glucose index and the incidence of atherosclerotic cardiovascular diseases: a meta-analysis of cohort studies. Cardiovasc Diabetol. 2021;20(1):76. doi: 10.1186/s12933-021-01268-9. https://cardiab.biomedcentral.com/articles/10.1186/s12933-021-01268-9 10.1186/s12933-021-01268-9 - DOI - DOI - PMC - PubMed
    1. Zhao W, Wu Y, Shi M, Bai L, Tu J, Guo Z, Jiang R, Zhang J, Ning X, Wang J. Sex differences in prevalence of and risk factors for carotid plaque among adults: a population-based cross-sectional study in Rural China. Sci Rep. 2016;6:38618. doi: 10.1038/srep38618. doi: 10.1038/srep38618.srep38618 - DOI - DOI - PMC - PubMed

Publication types