Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 9:62:102149.
doi: 10.1016/j.eclinm.2023.102149. eCollection 2023 Aug.

Large-scale identification of undiagnosed hepatic steatosis using natural language processing

Affiliations

Large-scale identification of undiagnosed hepatic steatosis using natural language processing

Carolin V Schneider et al. EClinicalMedicine. .

Abstract

Background: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic health record (EHR) could identify undiagnosed patients with hepatic steatosis based on pathology and radiology reports.

Methods: A rule-based NLP algorithm was built using a Linguamatics literature text mining tool to search 2.15 million pathology report and 2.7 million imaging reports in the Penn Medicine EHR from November 2014, through December 2020, for evidence of hepatic steatosis. For quality control, two independent physicians manually reviewed randomly chosen biopsy and imaging reports (n = 353, PPV 99.7%).

Findings: After exclusion of individuals with other causes of hepatic steatosis, 3007 patients with biopsy-proven NAFLD and 42,083 patients with imaging-proven NAFLD were identified. Interestingly, elevated ALT was not a sensitive predictor of the presence of steatosis, and only half of the biopsied patients with steatosis ever received an ICD diagnosis code for the presence of NAFLD/NASH. There was a robust association for PNPLA3 and TM6SF2 risk alleles and steatosis identified by NLP. We identified 234 disorders that were significantly over- or underrepresented in all subjects with steatosis and identified changes in serum markers (e.g., GGT) associated with presence of steatosis.

Interpretation: This study demonstrates clear feasibility of NLP-based approaches to identify patients whose steatosis was indicated in imaging and pathology reports within a large healthcare system and uncovers undercoding of NAFLD in the general population. Identification of patients at risk could link them to improved care and outcomes.

Funding: The study was funded by US and German funding sources that did provide financial support only and had no influence or control over the research process.

Keywords: Biopsy; EHR; Liver disease; NAFLD; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
(A) Flowchart of the selection process of steatotic individuals using NLP and controls. We applied an NLP algorithm to 2.67 million imaging reports and, after applying several exclusions, we identified 42,083 discrete patients in whom the presence of hepatic steatosis was specifically reported by the radiologist. We also searched 2.15 million pathology reports to find 34,437 liver biopsy reports. After applying exclusions for other known causes of steatosis and liver disease, we identified a total of 3007 discrete patients with biopsy-proven NAFLD. Among these patients, 1210 patients met criteria for NASH, 456 patients had borderline NASH, and 1341 patients had steatosis (B) Genetic analysis of steatosis on imaging patients compared to controls. Manhattan plot of genome-wide markers for imaging-identified steatosis (2840 cases and 21,195 controls). Logistic regression analysis performed ancestry specific EWASs in the PMBB using WES data that had been filtered using a series of quality control filters known as the Goldilocks filter assuming an additive genetic model, adjusted for age, sex, BMI, and genetic ancestry. Results are plotted as –log10 p values on the y-axis by position in chromosome (x-axis) (NCBI build 37).
Fig. 1
Fig. 1
(A) Flowchart of the selection process of steatotic individuals using NLP and controls. We applied an NLP algorithm to 2.67 million imaging reports and, after applying several exclusions, we identified 42,083 discrete patients in whom the presence of hepatic steatosis was specifically reported by the radiologist. We also searched 2.15 million pathology reports to find 34,437 liver biopsy reports. After applying exclusions for other known causes of steatosis and liver disease, we identified a total of 3007 discrete patients with biopsy-proven NAFLD. Among these patients, 1210 patients met criteria for NASH, 456 patients had borderline NASH, and 1341 patients had steatosis (B) Genetic analysis of steatosis on imaging patients compared to controls. Manhattan plot of genome-wide markers for imaging-identified steatosis (2840 cases and 21,195 controls). Logistic regression analysis performed ancestry specific EWASs in the PMBB using WES data that had been filtered using a series of quality control filters known as the Goldilocks filter assuming an additive genetic model, adjusted for age, sex, BMI, and genetic ancestry. Results are plotted as –log10 p values on the y-axis by position in chromosome (x-axis) (NCBI build 37).
Fig. 2
Fig. 2
Comorbidity PheWAS analysis for patients with (A) steatosis on imaging and (B) biopsy proven steatosis compared to controls. This analysis only includes diagnoses that were diagnosed prior to the imaging. Manhattan plot of adjusted −log10 (p-values) for all PheCodes comparing their occurrence. Highlighted are associations results with p-values < 3 × 10−4. Upwards/downwards pointing trials refer to PheCodes that are over-/underrepresented. (C) Venn Diagram showing overlap of phecodes that were diagnosed prior to the imaging/biopsy for biopsy and imaging identified steatosis.
Fig. 2
Fig. 2
Comorbidity PheWAS analysis for patients with (A) steatosis on imaging and (B) biopsy proven steatosis compared to controls. This analysis only includes diagnoses that were diagnosed prior to the imaging. Manhattan plot of adjusted −log10 (p-values) for all PheCodes comparing their occurrence. Highlighted are associations results with p-values < 3 × 10−4. Upwards/downwards pointing trials refer to PheCodes that are over-/underrepresented. (C) Venn Diagram showing overlap of phecodes that were diagnosed prior to the imaging/biopsy for biopsy and imaging identified steatosis.

References

    1. Younossi Z.M., Koenig A.B., Abdelatif D., Fazel Y., Henry L., Wymer M. Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology. 2016;64(1):73–84. doi: 10.1002/hep.28431. - DOI - PubMed
    1. Baratta F., Pastori D., Angelico F., et al. Nonalcoholic fatty liver disease and fibrosis associated with increased risk of cardiovascular events in a prospective study. Clin Gastroenterol Hepatol. 2019;18(10):2324–2331.e4. doi: 10.1016/j.cgh.2019.12.026. - DOI - PubMed
    1. Chalasani N., Younossi Z., Lavine J.E., et al. The diagnosis and management of non-alcoholic fatty liver disease: practice guideline by the American association for the study of liver diseases, American college of gastroenterology, and the American gastroenterological association. Hepatology. 2012;55(6):2005–2023. doi: 10.1002/hep.25762. - DOI - PubMed
    1. Vujkovic M., Ramdas S., Lorenz K.M., et al. A multiancestry genome-wide association study of unexplained chronic ALT elevation as a proxy for nonalcoholic fatty liver disease with histological and radiological validation. Nat Genet. 2022;54(6):761–771. doi: 10.1038/s41588-022-01078-z. - DOI - PMC - PubMed
    1. Dyson J.K., Anstee Q.M., McPherson S. Non-alcoholic fatty liver disease: a practical approach to diagnosis and staging. Frontline Gastroenterol. 2014;5(3):211–218. doi: 10.1136/flgastro-2013-100403. - DOI - PMC - PubMed

LinkOut - more resources