. 2023 Jan 1;57(1):82-88.

doi: 10.1097/MCG.0000000000001586.

An AI Approach for Identifying Patients With Cirrhosis

Jihad S Obeid¹, Ali Khalifa², Brandon Xavier², Halim Bou-Daher², Don C Rockey^{2

3}

Affiliations

¹ Department of Public Health Sciences.
² Division of Gastroenterology and Hepatology.
³ Medical University of South Carolina Digestive Disease Research Center, Medical University of South Carolina, Charleston, SC.

PMID: 34238846
PMCID: PMC8741865
DOI: 10.1097/MCG.0000000000001586

An AI Approach for Identifying Patients With Cirrhosis

Jihad S Obeid et al. J Clin Gastroenterol. 2023.

. 2023 Jan 1;57(1):82-88.

doi: 10.1097/MCG.0000000000001586.

Authors

Jihad S Obeid¹, Ali Khalifa², Brandon Xavier², Halim Bou-Daher², Don C Rockey^{2

3}

Affiliations

¹ Department of Public Health Sciences.
² Division of Gastroenterology and Hepatology.
³ Medical University of South Carolina Digestive Disease Research Center, Medical University of South Carolina, Charleston, SC.

PMID: 34238846
PMCID: PMC8741865
DOI: 10.1097/MCG.0000000000001586

Abstract

Goal: The goal of this study was to evaluate an artificial intelligence approach, namely deep learning, on clinical text in electronic health records (EHRs) to identify patients with cirrhosis.

Background and aims: Accurate identification of cirrhosis in EHR is important for epidemiological, health services, and outcomes research. Currently, such efforts depend on International Classification of Diseases (ICD) codes, with limited success.

Materials and methods: We trained several machine learning models using discharge summaries from patients with known cirrhosis from a patient registry and random controls without cirrhosis or its complications based on ICD codes. Models were validated on patients for whom discharge summaries were manually reviewed and used as the gold standard test set. We tested Naive Bayes and Random Forest as baseline models and a deep learning model using word embedding and a convolutional neural network (CNN).

Results: The training set included 446 cirrhosis patients and 689 controls, while the gold standard test set included 139 cirrhosis patients and 152 controls. Among the machine learning models, the CNN achieved the highest area under the receiver operating characteristic curve (0.993), with a precision of 0.965 and recall of 0.978, compared with 0.879 and 0.981 for the Naive Bayes and Random Forest, respectively (precision 0.787 and 0.958, and recalls 0.878 and 0.827). The precision by ICD codes for cirrhosis was 0.883 and recall was 0.978.

Conclusions: A CNN model trained on discharge summaries identified cirrhosis patients with high precision and recall. This approach for phenotyping cirrhosis in the EHR may provide a more accurate assessment of disease burden in a variety of studies.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest disclosure statement:

The authors have no competing interests to disclose.

Figures

**Figure 1.. Consort diagram of patients.**
The diagram depicts selection of patients from all patients of MUSC research data warehouse discharge summaries, and the two cohort data sets (training and cross-validation set and the test set).

**Figure 2.. Flowchart of the machine learning workflow.**
A flowchart depicting an overview of the steps used for training and evaluating the machine learning models and the order in which they were executed.

**Figure 3.. Stochastic Neighbor Embedding (t-SNE).**
Shown is a Word2vec t-distributed Stochastic Neighbor Embedding (t-SNE) plot in a 2-dimenstional space (reduced from 200 word2vec dimensions to two variables V1 and V2) for a select set of relevant key words. Clustering of contextually, semantically, and syntactically related words is depicted. For example, typographical errors such as “ascites” and “ascities”, words of similar derivation such as “varices” and “variceal”, and contextually related words such as “cirrhosis”, “alcoholic”, “NASH”, and “steatohepatitis” are close to each other on the plot.

**Figure 4.. Area under the receiver operating characteristic curve (AUC).**
AUCs of the classifiers on discharge summaries to identify cirrhosis patients are shown. The figure demonstrates the results on the gold standard test set. Abbreviations: AUC: area under the receiver operating characteristic curve; NB: Naïve Bayes; RF: Random Forest; CNN: convolutional neural network.

**Figure 5.. Variable importance analysis for cirrhosis.**
The mean decrease in accuracy is shown for the top 20 stemmed words (i.e. only the roots of the words) based on the random forest classifier. Words with a higher mean decrease in accuracy are the more important ones for the classifier. For example, “cirrhosi” and “varic” are the stemmed versions of “cirrhosis” and “varices” respectively.

See this image and copyright information in PMC

References

1. Scaglione S, Kliethermes S, Cao G, Shoham D, Durazo R, Luke A, et al. The Epidemiology of Cirrhosis in the United States: A Population-based Study. J Clin Gastroenterol. 2015. Sep;49(8):690–6. - PubMed
1. Heron M Deaths: Leading Causes for 2017. Natl Vital Stat Rep. 2019. Jun;68(6):1–77. - PubMed
1. Miniño AM, Xu J, Kochanek KD, Tejada-Vera B. Death in the United States, 2007. NCHS Data Brief. 2009. Dec;(26):1–8. - PubMed
1. Mokdad AA, Lopez AD, Shahraz S, Lozano R, Mokdad AH, Stanaway J, et al. Liver cirrhosis mortality in 187 countries between 1980 and 2010: a systematic analysis. BMC Med. 2014. Sep 18;12:145. - PMC - PubMed
1. Ho CK, Maselli JH, Terrault NA, Gonzales R. High Rate of Hospital Admissions Among Patients with Cirrhosis Seeking Care in US Emergency Departments. Dig Dis Sci. 2015. Jul;60(7):2183–9. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An AI Approach for Identifying Patients With Cirrhosis

Affiliations

An AI Approach for Identifying Patients With Cirrhosis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources