. 2025 Dec;57(1):2477294.

doi: 10.1080/07853890.2025.2477294. Epub 2025 Mar 19.

Identifying liver cirrhosis in patients with chronic hepatitis B: an interpretable machine learning algorithm based on LSM

Xueting Bai¹, Chunwen Pu², Wenchong Zhen¹, Yushuang Huang², Qian Zhang¹, Zihan Li¹, Yixin Zhang¹, Rongxuan Xu¹, Zhihan Yao¹, Wei Wu¹, Mei Sun², Xiaofeng Li¹

Affiliations

¹ Department of Epidemiology and Health Statistics, Dalian Medical University, Dalian, China.
² Dalian Public Health Clinical Center, Dalian, Liaoning province, China.

PMID: 40104981
PMCID: PMC11924261
DOI: 10.1080/07853890.2025.2477294

Identifying liver cirrhosis in patients with chronic hepatitis B: an interpretable machine learning algorithm based on LSM

Xueting Bai et al. Ann Med. 2025 Dec.

. 2025 Dec;57(1):2477294.

doi: 10.1080/07853890.2025.2477294. Epub 2025 Mar 19.

Authors

Xueting Bai¹, Chunwen Pu², Wenchong Zhen¹, Yushuang Huang², Qian Zhang¹, Zihan Li¹, Yixin Zhang¹, Rongxuan Xu¹, Zhihan Yao¹, Wei Wu¹, Mei Sun², Xiaofeng Li¹

Affiliations

¹ Department of Epidemiology and Health Statistics, Dalian Medical University, Dalian, China.
² Dalian Public Health Clinical Center, Dalian, Liaoning province, China.

PMID: 40104981
PMCID: PMC11924261
DOI: 10.1080/07853890.2025.2477294

Abstract

Background: Chronic hepatitis B (CHB) is a common cause of liver cirrhosis (LC), a condition associated with an unfavourable prognosis. Therefore, timely diagnosis of LC in CHB patients is crucial.

Objective: This study aimed to enhance the diagnostic accuracy of LC in CHB patients by integrating liver stiffness measurement (LSM) with traditional indicators.

Methods: The study participants were randomly divided into training and internal validation sets. Employing the least absolute shrinkage and selection operator (LASSO) and random forest-recursive feature elimination (RF-RFE) for feature selection, we developed both traditional logistic regression and five machine learning models (k-nearest neighbors, random forest (RF), artificial neural network, support vector machine and eXtreme Gradient Boosting). Performance evaluation included receiver operating characteristic curves, calibration curves and decision curve analysis. Shapley additive explanations (SHAP) was employed to improve the interpretability of the optimal model.

Results: We retrospectively included 1609 patients with CHB, among whom 470 were diagnosed with cirrhosis. Cirrhosis was diagnosed based on histological confirmation or clinical assessment, supported by characteristic findings on abdominal ultrasound and corroborative evidence such as thrombocytopenia, varices or imaging from CT/MRI. In the internal validation, the RF model achieved an accuracy above 0.80 and an AUC above 0.80, with outstanding calibration ability and clinical net benefit. Additionally, the model exhibited excellent predictive performance in an independent external validation set. The SHAP analysis indicated that LSM contributed the most to the model. The model still showed strong discriminative power when using only LSM or traditional indicators alone.

Conclusions: Machine learning models, especially the RF model, can effectively identify LC in CHB patients. Integrating LSM with traditional indicators can enhance diagnostic performance.

Keywords: Chronic hepatitis B; diagnostic model; liver cirrhosis; liver stiffness measurement; machine learning.

Plain language summary

Liver cirrhosis (LC) is a common complication of chronic hepatitis B (CHB).The random forest (RF) model showed the best overall performance to identify LC in CHB patients in our study, which could assist in the clinical decision-making procedure.Integrating LSM with traditional indicators can enhance the diagnostic performance of LC in CHB patients. In the absence of LSM, other traditional indicators can also diagnose LC effectively.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest was reported by the author(s).

Figures

**Figure 1.**
Flowchart. LR: logistic regression; XGBoost: eXtreme Gradient Boosting; ANN: artificial neural network; RF: random forest; KNN: k-nearest neighbors; SVM: support vector machine; ROC: receiver operating characteristic; DCA: decision curve analysis; SHAP: Shapley additive explanations.

**Figure 2.**
Screening of characteristic factors. (A) Feature variables screening based on RF-RFE. (B) Feature variables screening based on LASSO. (C) LASSO combined RF-RFE. LASSO: least absolute shrinkage and selection operator; RF: random forest; RFE: recursive feature elimination.

**Figure 3.**
ROC curves for the prediction models. (A) ROC curve in the training set. (B) ROC curve in the internal validation set. (C) ROC curve in the external validation set. ROC: receiver operating characteristic; AUC: area under the curve; ANN: artificial neural network; KNN: k-nearest neighbors; LR: logistic regression; RF: random Forest; SVM: support vector machine; XGBoost: eXtreme Gradient Boosting.

**Figure 4.**
Confusion matrices of six models in the training and validation sets. (A) Confusion matrices in the training set. (B) Confusion matrices in the internal validation set. (C) Confusion matrices in the external validation set. LR: logistic regression; ANN: artificial neural network; SVM: support vector machine; RF: random Forest; KNN: k-nearest neighbors; XGBoost: eXtreme Gradient Boosting; LC: liver cirrhosis.

**Figure 5.**
Calibrate curves for the prediction models. (A) Comprehensive summary figure of six models in the training set. (B) Comprehensive summary figure of six models in the internal validation set. (C) Comprehensive summary figure of six models in the external validation set. (D) Stratified plots of six models in the training set. (E) Stratified plots of six models in the internal validation set. (F) Stratified plots of six models in the external validation set. ANN: artificial neural network; KNN: k-nearest neighbors; LR: logistic regression; RF: random Forest; SVM: support vector machine; XGBoost: eXtreme Gradient Boosting.

**Figure 6.**
DCA curves for the prediction models. (A) DCA curve in the training set. (B) DCA curve in the internal validation set. (C) DCA curve in the external validation set. The treat all curve represents the benefit rates for all cases with intervention, while the treat none curve represents the benefit rates for all cases without intervention. The remaining curves denote various models. The threshold probability represents the probability cut-off used to make a decision, while the net benefit indicates the clinical utility gained from using the model compared to alternative strategies. ANN: artificial neural network; KNN: k-nearest neighbors; LR: logistic regression; RF: random forest; SVM: support vector machine; XGBoost: eXtreme Gradient Boosting.

**Figure 7.**
ROC curves for the RF model using LSM with traditional indicators, FIB-4, APRI and GPR. (A) ROC curve in the training set. (B) ROC curve in the internal validation set. (C) ROC curve in the external validation set. FIB-4: fibrosis-4 index; APRI: aspartate aminotransferase to platelet ratio index; GPR: the γ-glutamyl transferase-to-platelet ratio; LSM: liver stiffness measurement; Trad: 17 traditional indicators; ROC: receiver operating characteristic; AUC: area under the curve; RF: random forest.

**Figure 8.**
ROC curves for the RF model based on the traditional indicators, LSM and traditional indicators, and LSM-only. (A) ROC curve in the training set. (B) ROC curve in the internal validation set. (C) ROC curve in the external validation set. LSM: liver stiffness measurement; Trad: 17 traditional indicators; ROC: receiver operating characteristic; AUC: area under the curve; RF: random forest.

**Figure 9.**
ROC curves for the RF model based on FIB-4, LSM-only and 17 traditional indicators. (A) ROC curve in the training set. (B) ROC curve in the internal validation set. (C) ROC curve in the external validation set. FIB-4: fibrosis-4 index; LSM: liver stiffness measurement; Trad: 17 traditional indicators; ROC: receiver operating characteristic; AUC: area under the curve; RF: random Forest.

**Figure 10.**
SHAP analysis based on the RF model. (A) Ranking of variable importance based on the mean SHAP value. (B) In the SHAP bee swarm plot, each row represents a feature, the x-axis represents the SHAP value, and each data point represents a sample. (C) SHAP analysis of liver cirrhosis in patients with chronic hepatitis B. (D) SHAP force plot of non-cirrhosis chronic hepatitis B patient.

See this image and copyright information in PMC

Cited by

Supervised Machine-Based Learning and Computational Analysis to Reveal Unique Molecular Signatures Associated with Wound Healing and Fibrotic Outcomes to Lens Injury.
Lalman C, Stabler KR, Yang Y, Walker JL. Lalman C, et al. Int J Mol Sci. 2025 Aug 1;26(15):7422. doi: 10.3390/ijms26157422. Int J Mol Sci. 2025. PMID: 40806551 Free PMC article.

References

1. Ginès P, Krag A, Abraldes JG, et al. . Liver cirrhosis. Lancet. 2021;398(10308):1359–1376. doi: 10.1016/S0140-6736(21)01374-X. - DOI - PubMed
1. Shih C, Yang CC, Choijilsuren G, et al. . Hepatitis B virus. Trends Microbiol. 2018;26(4):386–387. doi: 10.1016/j.tim.2018.01.009. - DOI - PubMed
1. Kisseleva T, Brenner D.. Molecular and cellular mechanisms of liver fibrosis and its regression. Nat Rev Gastroenterol Hepatol. 2021;18(3):151–166. doi: 10.1038/s41575-020-00372-7. - DOI - PubMed
1. Jung YK, Yim HJ.. Reversal of liver cirrhosis: current evidence and expectations. Korean J Intern Med. 2017;32(2):213–228. doi: 10.3904/kjim.2016.268. - DOI - PMC - PubMed
1. He ZY, Wang BQ, You H.. Reversal of cirrhotic decompensation: re-compensation. Zhonghua Gan Zang Bing Za Zhi. 2019;27(12):915–918. doi: 10.3760/cma.j.issn.1007-3418.2019.12.002. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying liver cirrhosis in patients with chronic hepatitis B: an interpretable machine learning algorithm based on LSM

Affiliations

Identifying liver cirrhosis in patients with chronic hepatitis B: an interpretable machine learning algorithm based on LSM

Authors

Affiliations

Abstract

Plain language summary

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Plain language summary

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical