Development of an automated phenotyping algorithm for hepatorenal syndrome

Jejo D Koola¹, Sharon E Davis², Omar Al-Nimri³, Sharidan K Parr⁴, Daniel Fabbri⁵, Bradley A Malin⁶, Samuel B Ho⁷, Michael E Matheny⁸

Affiliations

¹ Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, CA, USA; Division of Hospital Medicine, Department of Medicine, University of California, San Diego, CA, USA. Electronic address: jkoola@ucsd.edu.
² Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
³ Northwest Renal Clinic, Portland, OR, USA.
⁴ Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁵ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA.
⁶ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA.
⁷ VA San Diego Healthcare System, San Diego, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Diego, CA, USA.
⁸ Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of General Internal Medicine and Public Health, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.

PMID: 29530803
PMCID: PMC5920557
DOI: 10.1016/j.jbi.2018.03.001

Development of an automated phenotyping algorithm for hepatorenal syndrome

Jejo D Koola et al. J Biomed Inform. 2018 Apr.

. 2018 Apr:80:87-95.

doi: 10.1016/j.jbi.2018.03.001. Epub 2018 Mar 9.

Authors

Jejo D Koola¹, Sharon E Davis², Omar Al-Nimri³, Sharidan K Parr⁴, Daniel Fabbri⁵, Bradley A Malin⁶, Samuel B Ho⁷, Michael E Matheny⁸

Affiliations

¹ Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, CA, USA; Division of Hospital Medicine, Department of Medicine, University of California, San Diego, CA, USA. Electronic address: jkoola@ucsd.edu.
² Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
³ Northwest Renal Clinic, Portland, OR, USA.
⁴ Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁵ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA.
⁶ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA.
⁷ VA San Diego Healthcare System, San Diego, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Diego, CA, USA.
⁸ Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of General Internal Medicine and Public Health, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.

PMID: 29530803
PMCID: PMC5920557
DOI: 10.1016/j.jbi.2018.03.001

Abstract

Objective: Hepatorenal Syndrome (HRS) is a devastating form of acute kidney injury (AKI) in advanced liver disease patients with high morbidity and mortality, but phenotyping algorithms have not yet been developed using large electronic health record (EHR) databases. We evaluated and compared multiple phenotyping methods to achieve an accurate algorithm for HRS identification.

Materials and methods: A national retrospective cohort of patients with cirrhosis and AKI admitted to 124 Veterans Affairs hospitals was assembled from electronic health record data collected from 2005 to 2013. AKI was defined by the Kidney Disease: Improving Global Outcomes criteria. Five hundred and four hospitalizations were selected for manual chart review and served as the gold standard. Electronic Health Record based predictors were identified using structured and free text clinical data, subjected through NLP from the clinical Text Analysis Knowledge Extraction System. We explored several dimension reduction techniques for the NLP data, including newer high-throughput phenotyping and word embedding methods, and ascertained their effectiveness in identifying the phenotype without structured predictor variables. With the combined structured and NLP variables, we analyzed five phenotyping algorithms: penalized logistic regression, naïve Bayes, support vector machines, random forest, and gradient boosting. Calibration and discrimination metrics were calculated using 100 bootstrap iterations. In the final model, we report odds ratios and 95% confidence intervals.

Results: The area under the receiver operating characteristic curve (AUC) for the different models ranged from 0.73 to 0.93; with penalized logistic regression having the best discriminatory performance. Calibration for logistic regression was modest, but gradient boosting and support vector machines were superior. NLP identified 6985 variables; a priori variable selection performed similarly to dimensionality reduction using high-throughput phenotyping and semantic similarity informed clustering (AUC of 0.81 - 0.82).

Conclusion: This study demonstrated improved phenotyping of a challenging AKI etiology, HRS, over ICD-9 coding. We also compared performance among multiple approaches to EHR-derived phenotyping, and found similar results between methods. Lastly, we showed that automated NLP dimension reduction is viable for acute illness.

Keywords: Acute kidney injury; Cirrhosis; Dimension reduction; Hepatorenal syndrome; Natural language processing; Phenotyping.

PubMed Disclaimer

Figures

**Figure 1. Workflow describing Natural Language Processing pipeline**
(Note: cTAKES: clinical Text Analysis Knowledge Extraction System; CUI: Concept Unique Identifier; AFEP: Automated Feature Extraction for Phenotyping; SAFE: Surrogate-Assisted Feature Extraction; PCA: Principal Component Analysis)

**Figure 2. Receiver Operating Characteristic curves for the five different various models for phenotyping Hepatorenal Syndrome phenotyping models**
(Note: The grey square represents performance for a Hepatorenal Syndrome ICD-9 code anytime during the admission. The grey circle represents a Hepatorenal Syndrome ICD-9 code as a discharge diagnosis. LR: Logistic Regression; SVM: Support Vector Machine; GBM: Gradient Boosting Machine; NB: Naïve Bayes; RF: Random Forest)

**Figure 3. Smoothed calibration curves for the observed-to-expected predicted probability plots for the five different various methods**
(Note: LR: Logistic Regression; SVM: Support Vector Machine; GBM: Gradient Boosting Machine; NB: Naïve Bayes; RF: Random Forest)

See this image and copyright information in PMC

References

1. Shivade C, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–230. - PMC - PubMed
1. Xu J, et al. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc. 2015;22:1251–1260. - PMC - PubMed
1. Gottesman O, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med. 2013;15:761–771. - PMC - PubMed
1. Kirby JC, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc. 2016;23:1046–1052. - PMC - PubMed
1. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20:117–121. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development of an automated phenotyping algorithm for hepatorenal syndrome

Affiliations

Development of an automated phenotyping algorithm for hepatorenal syndrome

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources