Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 12;28(6):1235-1241.
doi: 10.1093/jamia/ocab003.

Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis

Affiliations

Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis

Matt Docherty et al. J Am Med Inform Assoc. .

Abstract

Objective: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML).

Materials and methods: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum® de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model.

Results: Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data.

Discussion: The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance.

Conclusion: The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources.

Keywords: NAFLD; NASH; artificial intelligence; machine learning; non-alcoholic fatty liver disease.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Model performance in NASH prediction using NIDDK data. Area under the curve (AUC), false positive rate (FPR).
Figure 2.
Figure 2.
Model performance in NASH prediction using Optum data. Area under the curve (AUC), false positive rate (FPR).

References

    1. Younossi ZM, Marchesini G, Pinto-Cortez H, Petta S.. Epidemiology of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis: implications for liver transplantation. Transplantation 2019; 103 (1): 22–7. - PubMed
    1. Suzuki A, Diehl AM.. Nonalcoholic steatohepatitis. Annu Rev Med 2017; 68 (1): 85–98. - PubMed
    1. Brunt EM, Wong VW, Nobili V, et al.Nonalcoholic fatty liver disease. Nat Rev Dis Primers 2015; 1 (1): 15080. - PubMed
    1. Chalasani N, Younossi Z, Lavine JE, et al.The diagnosis and management of nonalcoholic fatty liver disease: practice guidance from the American Association for the Study of Liver Diseases. Hepatology 2018; 67 (1): 328–57. - PubMed
    1. Rockey DC, Caldwell SH, Goodman ZD, Nelson RC, Smith AD.. American Association for the Study of Liver D. Liver biopsy. Hepatology 2009; 49 (3): 1017–44. - PubMed

Publication types