Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 7;31(9):101383.
doi: 10.3748/wjg.v31.i9.101383.

Machine learning-based models for advanced fibrosis in non-alcoholic steatohepatitis patients: A cohort study

Affiliations

Machine learning-based models for advanced fibrosis in non-alcoholic steatohepatitis patients: A cohort study

Fei-Xiang Xiong et al. World J Gastroenterol. .

Abstract

Background: The global prevalence of non-alcoholic steatohepatitis (NASH) and its associated risk of adverse outcomes, particularly in patients with advanced liver fibrosis, underscores the importance of early and accurate diagnosis.

Aim: To develop a machine learning-based diagnostic model for advanced liver fibrosis in NASH patients.

Methods: A total of 749 patients who underwent liver biopsy at Beijing Ditan Hospital, Capital Medical University, between January 2010 and January 2020 were included. Patients were randomly divided into training (n = 522) and validation (n = 224) cohorts. Five machine learning models were applied to predict advanced liver fibrosis, with feature selection based on Shapley Additive Explanations (SHAP). The diagnostic performance of these models was compared to traditional scores such as the aspartate aminotransferase to platelet ratio index (APRI) and fibrosis index based on the 4 factors (FIB-4), using metrics including the area under the receiver operating characteristic curve (AUROC), decision curve analysis (DCA), and calibration curves.

Results: The Extreme Gradient Boosting (XGBoost) model outperformed all other machine learning models, achieving an AUROC of 0.934 (95%CI: 0.914-0.955) in the training cohort and 0.917 (95%CI: 0.880-0.953) in the validation cohort (P < 0.001). Incorporating liver stiffness measurement into the model further improved its performance, with an AUROC of 0.977 (95%CI: 0.966-0.980) in the training cohort and 0.970 (95%CI: 0.950-0.990) in the validation cohort, significantly surpassing APRI and FIB-4 scores (P < 0.001). The XGBoost model also demonstrated superior clinical utility, as evidenced by DCA and calibration curve analysis in both cohorts.

Conclusion: The XGBoost model provides a highly accurate, non-invasive diagnosis of advanced liver fibrosis in NASH patients, outperforming traditional methods. An online tool based on this model has been developed to assist clinicians in evaluating the risk of advanced liver fibrosis.

Keywords: Advanced fibrosis; Extreme Gradient Boosting; Machine learning; Non-alcoholic steatohepatitis; Non-invasive.

PubMed Disclaimer

Conflict of interest statement

Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.

Figures

Figure 1
Figure 1
Outline of the recruitment and grouping of non-alcoholic steatohepatitis patitents. HBV: Hepatitis B virus; HCV: Hepatitis C virus; HIV: Human immunodeficiency virus; HCC: Hepatocellular carcinoma; XGBoost: The Extreme Gradient Boosting; LR: Logistic regression; RF: Random forest; SVM: Support Vector Machine; NB: Naive Bayes.
Figure 2
Figure 2
Shapley Additive Explanations plot: The weight of clinical features for diagnosing advanced liver fibrosis through a random forest algorithm. A: Beeswarm plot; B: Importance bar chart. The plots showed that triglycerides, albumin, international normalized ratio and high-density lipoprotein were the top four indicators. TG: Triglycerides; ALB: Albumin; INR: International normalized ratio; HDL: High-density lipoprotein; GGT: Gamma-glutamyl transferase; AST: Alanine aminotransferase; LSM: Liver stiffness measurement; WBC: White blood cell; CAP: Controlled attenuation parameter; SHAP: Shapley Additive Explanations.
Figure 3
Figure 3
The receiver operating characteristic of 5 machine learning models to diagnose advanced liver fibrosis. A: Training cohort; B: Validation cohort. The receiver operating characteristic of the Extreme Gradient Boosting model was much better than the other machine learning models. LR: Logistic regression; RF: Random forest; SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting; NB: Naive Bayes; AUC: Area under the curve.
Figure 4
Figure 4
The receiver operating characteristic of comparing Extreme Gradient Boosting models with another non-invasive diagnosis model. A: Training cohort; B Validation cohort. The receiver operating characteristic of the Extreme Gradient Boosting (XGBoost) model and XGBoost + liver stiffness measure model were much better than aspartate aminotransferase to platelet ratio index scores and Fibrosis-4 scores. APRI: Aspartate aminotransferase to platelet ratio index score; FIB-4: Fibrosis index based on the 4 factors; LSM: Liver stiffness measure; AUC: Area under the curve; XGBoost: Extreme Gradient Boosting.
Figure 5
Figure 5
The decision curve analysis of the Extreme Gradient Boosting model, Extreme Gradient Boosting + liver stiffness measurement model, aspartate aminotransferase to platelet ratio index score and Fibrosis index based on the 4 factors score. A: Training cohort; B: Validation cohort. The Extreme Gradient Boosting models showed better net benefits than the aspartate aminotransferase to platelet ratio index score and Fibrosis index based on the 4 factors score. DCA: Decision curve analysis; APRI: Aspartate aminotransferase to platelet ratio index; FIB-4: Fibrosis index based on the 4 factors; LSM: Liver stiffness measurement; XGBoost: Extreme Gradient Boosting.
Figure 6
Figure 6
The cumulative curve of the Extreme Gradient Boosting model for diagnosing advanced liver fibrosis. A: Training cohort; B: Validation cohort. The calibration curve shows good consistency between the predicted probabilities and the actual probabilities in both the training and validation sets.
Figure 7
Figure 7
The Shapley Additive Explanations value in four indicators. When albumin is below 35 g/L, international normalized ratio is in the 1.75-2.00 range, high-density lipoprotein is in the 1.5-2.0 mmol/L range, and triglycerides is above 5.0 mmol/L, the Shapley Additive Explanations values tend to be higher. SHAP: Shapley Additive Explanations; ALB: Albumin; TG: Triglycerides; INR: International normalized ratio; HDL: High-density lipoprotein.

References

    1. Younossi ZM, Golabi P, Paik JM, Henry A, Van Dongen C, Henry L. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review. Hepatology. 2023;77:1335–1347. - PMC - PubMed
    1. Man S, Deng Y, Ma Y, Fu J, Bao H, Yu C, Lv J, Liu H, Wang B, Li L. Prevalence of Liver Steatosis and Fibrosis in the General Population and Various High-Risk Populations: A Nationwide Study With 5.7 Million Adults in China. Gastroenterology. 2023;165:1025–1040. - PubMed
    1. Behari J, Gougol A, Wang R, Luu HN, Paragomi P, Yu YC, Molinari M, Chopra K, Malik SM, Geller D, Yuan JM. Incidence of hepatocellular carcinoma in nonalcoholic fatty liver disease without cirrhosis or advanced liver fibrosis. Hepatol Commun. 2023;7 - PMC - PubMed
    1. Kong AP, Lau ES, O CK, Luk AO, Yip TC, Chow EY, Kwok R, Lee HW, Wong GL, Ma RC, Chan HL, Wong VW, Chan JC. Advanced liver fibrosis predicts heart failure and hospitalizations in people with type 2 diabetes: A prospective cohort study from Hong Kong Diabetes Register. Diabetes Res Clin Pract. 2023;202:110825. - PubMed
    1. Mantovani A, Csermely A, Petracca G, Beatrice G, Corey KE, Simon TG, Byrne CD, Targher G. Non-alcoholic fatty liver disease and risk of fatal and non-fatal cardiovascular events: an updated systematic review and meta-analysis. Lancet Gastroenterol Hepatol. 2021;6:903–913. - PubMed

MeSH terms

Substances