Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr:80:87-95.
doi: 10.1016/j.jbi.2018.03.001. Epub 2018 Mar 9.

Development of an automated phenotyping algorithm for hepatorenal syndrome

Affiliations

Development of an automated phenotyping algorithm for hepatorenal syndrome

Jejo D Koola et al. J Biomed Inform. 2018 Apr.

Abstract

Objective: Hepatorenal Syndrome (HRS) is a devastating form of acute kidney injury (AKI) in advanced liver disease patients with high morbidity and mortality, but phenotyping algorithms have not yet been developed using large electronic health record (EHR) databases. We evaluated and compared multiple phenotyping methods to achieve an accurate algorithm for HRS identification.

Materials and methods: A national retrospective cohort of patients with cirrhosis and AKI admitted to 124 Veterans Affairs hospitals was assembled from electronic health record data collected from 2005 to 2013. AKI was defined by the Kidney Disease: Improving Global Outcomes criteria. Five hundred and four hospitalizations were selected for manual chart review and served as the gold standard. Electronic Health Record based predictors were identified using structured and free text clinical data, subjected through NLP from the clinical Text Analysis Knowledge Extraction System. We explored several dimension reduction techniques for the NLP data, including newer high-throughput phenotyping and word embedding methods, and ascertained their effectiveness in identifying the phenotype without structured predictor variables. With the combined structured and NLP variables, we analyzed five phenotyping algorithms: penalized logistic regression, naïve Bayes, support vector machines, random forest, and gradient boosting. Calibration and discrimination metrics were calculated using 100 bootstrap iterations. In the final model, we report odds ratios and 95% confidence intervals.

Results: The area under the receiver operating characteristic curve (AUC) for the different models ranged from 0.73 to 0.93; with penalized logistic regression having the best discriminatory performance. Calibration for logistic regression was modest, but gradient boosting and support vector machines were superior. NLP identified 6985 variables; a priori variable selection performed similarly to dimensionality reduction using high-throughput phenotyping and semantic similarity informed clustering (AUC of 0.81 - 0.82).

Conclusion: This study demonstrated improved phenotyping of a challenging AKI etiology, HRS, over ICD-9 coding. We also compared performance among multiple approaches to EHR-derived phenotyping, and found similar results between methods. Lastly, we showed that automated NLP dimension reduction is viable for acute illness.

Keywords: Acute kidney injury; Cirrhosis; Dimension reduction; Hepatorenal syndrome; Natural language processing; Phenotyping.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Workflow describing Natural Language Processing pipeline
(Note: cTAKES: clinical Text Analysis Knowledge Extraction System; CUI: Concept Unique Identifier; AFEP: Automated Feature Extraction for Phenotyping; SAFE: Surrogate-Assisted Feature Extraction; PCA: Principal Component Analysis)
Figure 2
Figure 2. Receiver Operating Characteristic curves for the five different various models for phenotyping Hepatorenal Syndrome phenotyping models
(Note: The grey square represents performance for a Hepatorenal Syndrome ICD-9 code anytime during the admission. The grey circle represents a Hepatorenal Syndrome ICD-9 code as a discharge diagnosis. LR: Logistic Regression; SVM: Support Vector Machine; GBM: Gradient Boosting Machine; NB: Naïve Bayes; RF: Random Forest)
Figure 3
Figure 3. Smoothed calibration curves for the observed-to-expected predicted probability plots for the five different various methods
(Note: LR: Logistic Regression; SVM: Support Vector Machine; GBM: Gradient Boosting Machine; NB: Naïve Bayes; RF: Random Forest)

References

    1. Shivade C, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–230. - PMC - PubMed
    1. Xu J, et al. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc. 2015;22:1251–1260. - PMC - PubMed
    1. Gottesman O, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med. 2013;15:761–771. - PMC - PubMed
    1. Kirby JC, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc. 2016;23:1046–1052. - PMC - PubMed
    1. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20:117–121. - PMC - PubMed

Publication types