Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
- PMID: 33684933
- PMCID: PMC8200272
- DOI: 10.1093/jamia/ocab003
Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
Abstract
Objective: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML).
Materials and methods: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum® de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model.
Results: Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data.
Discussion: The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance.
Conclusion: The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources.
Keywords: NAFLD; NASH; artificial intelligence; machine learning; non-alcoholic fatty liver disease.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Figures
References
-
- Younossi ZM, Marchesini G, Pinto-Cortez H, Petta S.. Epidemiology of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis: implications for liver transplantation. Transplantation 2019; 103 (1): 22–7. - PubMed
-
- Suzuki A, Diehl AM.. Nonalcoholic steatohepatitis. Annu Rev Med 2017; 68 (1): 85–98. - PubMed
-
- Brunt EM, Wong VW, Nobili V, et al.Nonalcoholic fatty liver disease. Nat Rev Dis Primers 2015; 1 (1): 15080. - PubMed
-
- Chalasani N, Younossi Z, Lavine JE, et al.The diagnosis and management of nonalcoholic fatty liver disease: practice guidance from the American Association for the Study of Liver Diseases. Hepatology 2018; 67 (1): 328–57. - PubMed
-
- Rockey DC, Caldwell SH, Goodman ZD, Nelson RC, Smith AD.. American Association for the Study of Liver D. Liver biopsy. Hepatology 2009; 49 (3): 1017–44. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
