Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 15;28(1):104-112.
doi: 10.1093/jamia/ocaa220.

Development of a predictive model for retention in HIV care using natural language processing of clinical notes

Affiliations

Development of a predictive model for retention in HIV care using natural language processing of clinical notes

Tomasz Oliwa et al. J Am Med Inform Assoc. .

Abstract

Objective: Adherence to a treatment plan from HIV-positive patients is necessary to decrease their mortality and improve their quality of life, however some patients display poor appointment adherence and become lost to follow-up (LTFU). We applied natural language processing (NLP) to analyze indications towards or against LTFU in HIV-positive patients' notes.

Materials and methods: Unstructured lemmatized notes were labeled with an LTFU or Retained status using a 183-day threshold. An NLP and supervised machine learning system with a linear model and elastic net regularization was trained to predict this status. Prevalence of characteristics domains in the learned model weights were evaluated.

Results: We analyzed 838 LTFU vs 2964 Retained notes and obtained a weighted F1 mean of 0.912 via nested cross-validation; another experiment with notes from the same patients in both classes showed substantially lower metrics. "Comorbidities" were associated with LTFU through, for instance, "HCV" (hepatitis C virus) and likewise "Good adherence" with Retained, represented with "Well on ART" (antiretroviral therapy).

Discussion: Mentions of mental health disorders and substance use were associated with disparate retention outcomes, however history vs active use was not investigated. There remains further need to model transitions between LTFU and being retained in care over time.

Conclusion: We provided an important step for the future development of a model that could eventually help to identify patients who are at risk for falling out of care and to analyze which characteristics could be factors for this. Further research is needed to enhance this method with structured electronic medical record fields.

Keywords: HIV; lost to follow-up; machine learning; natural language processing; retention in care.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the automated labeling rules to label a note as LTFU or Retained.
Figure 2.
Figure 2.
An overview of the natural language processing (NLP) and machine learning approach. As an example with synthetic data, “adherence is bad” is shown to be matched correctly with the characteristic “bad adherence.”

Similar articles

Cited by

References

    1. Ulett KB, Willig JH, Lin HY, et al.The therapeutic implications of timely linkage and early retention in HIV care. AIDS Patient Care STDS 2009; 23 (1): 41–9. - PMC - PubMed
    1. Skarbinski J, Rosenberg E, Paz-Bailey G, et al.Human immunodeficiency virus transmission at each step of the care continuum in the United States. JAMA Intern Med 2015; 175 (4): 588–96. - PubMed
    1. The Lancet HIV. U=U taking off in 2017. Lancet HIV 2017; 4 (11): e475. - PubMed
    1. Understanding the HIV Care Continuum. 2019. https://www.cdc.gov/hiv/pdf/library/factsheets/cdc-hiv-care-continuum.pdf Accessed September 18, 2020.
    1. Status of HIV in America. 2019. https://www.hiv.gov/federal-response/ending-the-hiv-epidemic/key-strategies Accessed September 18, 2020.

Publication types