Rapid identification and phenotyping of nonalcoholic fatty liver disease patients using a machine-based approach in diverse healthcare systems
- PMID: 39739635
- PMCID: PMC11686338
- DOI: 10.1111/cts.70105
Rapid identification and phenotyping of nonalcoholic fatty liver disease patients using a machine-based approach in diverse healthcare systems
Abstract
Nonalcoholic fatty liver disease (NAFLD) is the most common global cause of chronic liver disease and remains under-recognized within healthcare systems. Therapeutic interventions are rapidly advancing for its inflammatory phenotype, nonalcoholic steatohepatitis (NASH) at all stages of disease. Diagnosis codes alone fail to recognize and stratify at-risk patients accurately. Our work aims to rapidly identify NAFLD patients within large electronic health record (EHR) databases for automated stratification and targeted intervention based on clinically relevant phenotypes. We present a rule-based phenotyping algorithm for efficient identification of NAFLD patients developed using EHRs from 6.4 million patients at Columbia University Irving Medical Center (CUIMC) and validated at two independent healthcare centers. The algorithm uses the Observational Medical Outcomes Partnership (OMOP) Common Data Model and queries structured and unstructured data elements, including diagnosis codes, laboratory measurements, and radiology and pathology modalities. Our approach identified 16,006 CUIMC NAFLD patients, 10,753 (67%) previously unidentifiable by NAFLD diagnosis codes. Fibrosis scoring on patients without histology identified 943 subjects with scores indicative of advanced fibrosis (FIB-4, APRI, NAFLD-FS). The algorithm was validated at two independent healthcare systems, University of Pennsylvania Health System (UPHS) and Vanderbilt Medical Center (VUMC), where 20,779 and 19,575 NAFLD patients were identified, respectively. Clinical chart review identified a high positive predictive value (PPV) across all healthcare systems: 91% at CUIMC, 75% at UPHS, and 85% at VUMC, and a sensitivity of 79.6%. Our rule-based algorithm provides an accurate, automated approach for rapidly identifying, stratifying, and sub-phenotyping NAFLD patients within a large EHR system.
© 2024 The Author(s). Clinical and Translational Science published by Wiley Periodicals LLC on behalf of American Society for Clinical Pharmacology and Therapeutics.
Conflict of interest statement
Patent for algorithm to Columbia University Trustees; © 2021 The Trustees of Columbia University in the City of New York. The owner has no objection to reproduction of the work for academic non‐commercial purposes, but otherwise reserves all copyright rights whatsoever. J.W., N.P.T. and A.O.B. are co‐inventors. J.W. has received research support from Janssen, Galectin, Intercept, Genfit, Shire, Conatus, Zydus, AMRA, and is on advisory boards for GlaxoSmithKline and AstraZeneca. R.M.C. has received research support from Intercept Pharmaceuticals and Merck, Inc., and consulting fees from Intercept and AstraZeneca. M.S. is funded by NIDDK R01DK132138, R01DK131547 and has an unrestricted grant from Grifols, SA. A.K. is a speaker and proctor for Intuitive, a reviewer for surgical videos for Crowd Sourced Assessment of Technical Skills (CSATs), and a consultant for Johnson and Johnson and Surgical Specialties Corporation. G.R. retired from Janssen Pharma R&D as Scientific Fellow and Head of Computational Sciences and is currently a Venture Partner in Samsara BioCapital, Palo Alto, CA. All other authors declared no competing interests for this work.
Figures





Similar articles
-
Development of an Algorithm to Identify Cases of Nonalcoholic Steatohepatitis Cirrhosis in the Electronic Health Record.Dig Dis Sci. 2021 May;66(5):1452-1460. doi: 10.1007/s10620-020-06388-y. Epub 2020 Jun 13. Dig Dis Sci. 2021. PMID: 32535780
-
Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression.Int J Med Inform. 2019 Sep;129:334-341. doi: 10.1016/j.ijmedinf.2019.06.028. Epub 2019 Jul 6. Int J Med Inform. 2019. PMID: 31445275 Free PMC article.
-
Non-alcoholic fatty liver disease - histological scoring systems: a large cohort single-center, evaluation study.APMIS. 2017 Nov;125(11):962-973. doi: 10.1111/apm.12742. APMIS. 2017. PMID: 29076589
-
AGA Clinical Practice Update on the Role of Noninvasive Biomarkers in the Evaluation and Management of Nonalcoholic Fatty Liver Disease: Expert Review.Gastroenterology. 2023 Oct;165(4):1080-1088. doi: 10.1053/j.gastro.2023.06.013. Epub 2023 Aug 4. Gastroenterology. 2023. PMID: 37542503 Review.
-
Diagnosis and Evaluation of Nonalcoholic Fatty Liver Disease/Nonalcoholic Steatohepatitis, Including Noninvasive Biomarkers and Transient Elastography.Clin Liver Dis. 2018 Feb;22(1):73-92. doi: 10.1016/j.cld.2017.08.004. Clin Liver Dis. 2018. PMID: 29128062 Review.
Cited by
-
Machine Learning-Based Non-Invasive Prediction of Metabolic Dysfunction-Associated Steatohepatitis in Obese Patients: A Retrospective Study.Diagnostics (Basel). 2025 Apr 25;15(9):1096. doi: 10.3390/diagnostics15091096. Diagnostics (Basel). 2025. PMID: 40361915 Free PMC article.
References
-
- Younossi Z, Anstee QM, Marietti M, et al. Global burden of NAFLD and NASH: trends, predictions, risk factors and prevention. Nat Rev Gastroenterol Hepatol. 2018;15:11‐20. - PubMed
-
- Malhi H, Brown RS, Lim JK, et al. Precipitous changes in nomenclature and definitions‐NAFLD becomes SLD: implications for and expectations of AASLD journals. Hepatology. 2023;78:1680‐1681. - PubMed
-
- Harrison SA, Pierre B, Guy CD, et al. A phase 3, randomized, controlled trial of Resmetirom in NASH with liver fibrosis. N Engl J Med. 2024;390:497‐509. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical