Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 6;24(1):333.
doi: 10.1186/s12859-023-05456-0.

IHCP: interpretable hepatitis C prediction system based on black-box machine learning models

Affiliations

IHCP: interpretable hepatitis C prediction system based on black-box machine learning models

Yongxian Fan et al. BMC Bioinformatics. .

Abstract

Background: Hepatitis C is a prevalent disease that poses a high risk to the human liver. Early diagnosis of hepatitis C is crucial for treatment and prognosis. Therefore, developing an effective medical decision system is essential. In recent years, many computational methods have been proposed to identify hepatitis C patients. Although existing hepatitis prediction models have achieved good results in terms of accuracy, most of them are black-box models and cannot gain the trust of doctors and patients in clinical practice. As a result, this study aims to use various Machine Learning (ML) models to predict whether a patient has hepatitis C, while also using explainable models to elucidate the prediction process of the ML models, thus making the prediction process more transparent.

Result: We conducted a study on the prediction of hepatitis C based on serological testing and provided comprehensive explanations for the prediction process. Throughout the experiment, we modeled the benchmark dataset, and evaluated model performance using fivefold cross-validation and independent testing experiments. After evaluating three types of black-box machine learning models, Random Forest (RF), Support Vector Machine (SVM), and AdaBoost, we adopted Bayesian-optimized RF as the classification algorithm. In terms of model interpretation, in addition to using common SHapley Additive exPlanations (SHAP) to provide global explanations for the model, we also utilized the Local Interpretable Model-Agnostic Explanations with stability (LIME_stabilitly) to provide local explanations for the model.

Conclusion: Both the fivefold cross-validation and independent testing show that our proposed method significantly outperforms the state-of-the-art method. IHCP maintains excellent model interpretability while obtaining excellent predictive performance. This helps uncover potential predictive patterns of the model and enables clinicians to better understand the model's decision-making process.

Keywords: Hepatitis C; Interpretable artificial intelligence; LIME; Machine learning; SHAP.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Confusion matrix. a Confusion matrix of RF in the UCI dataset. b Confusion matrix of SVM in the UCI dataset. c Confusion matrix of AdaBoost in the UCI dataset. d Confusion matrix of RF in the independent testing set. e Confusion matrix of SVM in the independent testing set. f Confusion matrix of AdaBoost in the independent testing set
Fig. 2
Fig. 2
SHAP summary diagram
Fig. 3
Fig. 3
Summary of the average absolute SHAP values on the model targets
Fig. 4
Fig. 4
LIME_stabilitly interpretation chart for no hepatitis patients
Fig. 5
Fig. 5
LIME_stabilitly interpretation chart for hepatitis patients
Fig. 6
Fig. 6
Comparison of datasets before and after balancing. a Comparison of the original unbalanced dataset in the UCI dataset. b Comparison of the dataset after the oversampling process in the UCI dataset. c Comparison of the original unbalanced dataset in the independent testing set. d Comparison of the dataset after the oversampling process in the independent testing set. (Sex: 0 = female, 1 = male; category: 0 = no hepatitis, 1 = hepatitis)
Fig. 7
Fig. 7
Overview of predicting hepatitis C patients and interpreting model predictions. Collect hepatitis dataset. Preprocess the hepatitis dataset with data cleaning, missing value filling, data balancing, and input into the model. Divide the dataset into training and testing sets to perform training and evaluate the best model. SHAP and LIME are applied to analyze the resulting experimental results
Fig. 8
Fig. 8
Model processing flowchart
Fig. 9
Fig. 9
Random forest algorithm workflow

References

    1. Peng J, Zou K, Zhou M, Teng Y, Zhu X, Zhang F, et al. An explainable artificial intelligence framework for the deterioration risk prediction of hepatitis patients. J Med Syst. 2021;45:1–9. doi: 10.1007/s10916-021-01736-5. - DOI - PubMed
    1. Yang H, Huang L, Xie Y, Bai M, Lu H, Zhao S, et al. A diagnostic model of autoimmune hepatitis in unknown liver injury based on noninvasive clinical data. Sci Rep. 2023;13:1–7. - PMC - PubMed
    1. Naseem R, Khan B, Shah MA, Wakil K, Khan A, Alosaimi W, et al. Performance assessment of classification algorithms on early detection of liver syndrome. J Healthc Eng. 2020;2020:1–13. doi: 10.1155/2020/6680002. - DOI - PMC - PubMed
    1. Patman G. A signature to predict disease progression in patients with hepatitis C and early-stage cirrhosis. Nat Rev Gastroenterol Hepatol. 2014;11:578–578. doi: 10.1038/nrgastro.2014.160. - DOI - PubMed
    1. Hashem S, Esmat G, Elakel W, Habashy S, Raouf SA, Elhefnawi M, et al. Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinf. 2018;15:861–868. doi: 10.1109/TCBB.2017.2690848. - DOI - PubMed