Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment

Daniel J Feller¹, Jason Zucker², Michael T Yin², Peter Gordon², Noémie Elhadad¹

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, NY.
² Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY.

PMID: 29084046
PMCID: PMC5762388
DOI: 10.1097/QAI.0000000000001580

Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment

Daniel J Feller et al. J Acquir Immune Defic Syndr. 2018.

. 2018 Feb 1;77(2):160-166.

doi: 10.1097/QAI.0000000000001580.

Authors

Daniel J Feller¹, Jason Zucker², Michael T Yin², Peter Gordon², Noémie Elhadad¹

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, NY.
² Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY.

PMID: 29084046
PMCID: PMC5762388
DOI: 10.1097/QAI.0000000000001580

Abstract

Objective: Universal HIV screening programs are costly, labor intensive, and often fail to identify high-risk individuals. Automated risk assessment methods that leverage longitudinal electronic health records (EHRs) could catalyze targeted screening programs. Although social and behavioral determinants of health are typically captured in narrative documentation, previous analyses have considered only structured EHR fields. We examined whether natural language processing (NLP) would improve predictive models of HIV diagnosis.

Methods: One hundred eighty-one HIV+ individuals received care at New York Presbyterian Hospital before a confirmatory HIV diagnosis and 543 HIV negative controls were selected using propensity score matching and included in the study cohort. EHR data including demographics, laboratory tests, diagnosis codes, and unstructured notes before HIV diagnosis were extracted for modeling. Three predictive algorithms were developed using machine-learning algorithms: (1) a baseline model with only structured EHR data, (2) baseline plus NLP topics, and (3) baseline plus NLP clinical keywords.

Results: Predictive models demonstrated a range of performance with F measures of 0.59 for the baseline model, 0.63 for the baseline + NLP topic model, and 0.74 for the baseline + NLP keyword model. The baseline + NLP keyword model yielded the highest precision by including keywords including "msm," "unprotected," "hiv," and "methamphetamine," and structured EHR data indicative of additional HIV risk factors.

Conclusions: NLP improved the predictive performance of automated HIV risk assessment by extracting terms in clinical text indicative of high-risk behavior. Future studies should explore more advanced techniques for extracting social and behavioral determinants from clinical text.

PubMed Disclaimer

Figures

**Figure 1**
Overview of Feature Engineering Process

**Figure 2**
Precision and recall for 3 modeling approaches (area = AUC)

See this image and copyright information in PMC

References

1. Vassall A, et al. Cost-effectiveness of HIV prevention for high-risk groups at scale: an economic evaluation of the Avahan programme in south India. Lancet Glob Health. 2014;2:e531–e540. - PubMed
1. Chou R, et al. Screening for HIV: Systematic Review to Update the U.S. Preventive Services Task Force Recommendation. Agency for Healthcare Research and Quality (US); 2012. - PubMed
1. [Accessed: 16th May 2017];HIV Screening and Testing | Guidelines and Recommendations | HIV/AIDS | CDC. Available at: https://www.cdc.gov/hiv/guidelines/testing.html.
1. Hsieh YH, et al. Evaluation of hidden HIV infections in an urban ED with a rapid HIV screening program. Am J Emerg Med. 2016;34:180–184. - PMC - PubMed
1. Zucker J, Cennimo D, Sugalski G, Swaminathan S. Identifying Areas for Improvement in the HIV Screening Process of a High-Prevalence Emergency Department. AIDS Patient Care STDs. 2016;30:247–253. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment

Affiliations

Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical