Development of a predictive model for retention in HIV care using natural language processing of clinical notes
- PMID: 33150369
- PMCID: PMC7810456
- DOI: 10.1093/jamia/ocaa220
Development of a predictive model for retention in HIV care using natural language processing of clinical notes
Abstract
Objective: Adherence to a treatment plan from HIV-positive patients is necessary to decrease their mortality and improve their quality of life, however some patients display poor appointment adherence and become lost to follow-up (LTFU). We applied natural language processing (NLP) to analyze indications towards or against LTFU in HIV-positive patients' notes.
Materials and methods: Unstructured lemmatized notes were labeled with an LTFU or Retained status using a 183-day threshold. An NLP and supervised machine learning system with a linear model and elastic net regularization was trained to predict this status. Prevalence of characteristics domains in the learned model weights were evaluated.
Results: We analyzed 838 LTFU vs 2964 Retained notes and obtained a weighted F1 mean of 0.912 via nested cross-validation; another experiment with notes from the same patients in both classes showed substantially lower metrics. "Comorbidities" were associated with LTFU through, for instance, "HCV" (hepatitis C virus) and likewise "Good adherence" with Retained, represented with "Well on ART" (antiretroviral therapy).
Discussion: Mentions of mental health disorders and substance use were associated with disparate retention outcomes, however history vs active use was not investigated. There remains further need to model transitions between LTFU and being retained in care over time.
Conclusion: We provided an important step for the future development of a model that could eventually help to identify patients who are at risk for falling out of care and to analyze which characteristics could be factors for this. Further research is needed to enhance this method with structured electronic medical record fields.
Keywords: HIV; lost to follow-up; machine learning; natural language processing; retention in care.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Figures


Similar articles
-
Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study.JMIR Med Inform. 2021 Mar 10;9(3):e23456. doi: 10.2196/23456. JMIR Med Inform. 2021. PMID: 33688848 Free PMC article.
-
Development of a machine learning prediction model for loss to follow-up in HIV care using routine electronic medical records in a low-resource setting.BMC Med Inform Decis Mak. 2025 May 19;25(1):192. doi: 10.1186/s12911-025-03030-7. BMC Med Inform Decis Mak. 2025. PMID: 40389908 Free PMC article.
-
Loss to follow-up before and after initiation of antiretroviral therapy in HIV facilities in Lilongwe, Malawi.PLoS One. 2018 Jan 26;13(1):e0188488. doi: 10.1371/journal.pone.0188488. eCollection 2018. PLoS One. 2018. PMID: 29373574 Free PMC article.
-
Risk factors for loss to follow-up from antiretroviral therapy programmes in low-income and middle-income countries.AIDS. 2020 Jul 15;34(9):1261-1288. doi: 10.1097/QAD.0000000000002523. AIDS. 2020. PMID: 32287056
-
Machine Learning and Clinical Informatics for Improving HIV Care Continuum Outcomes.Curr HIV/AIDS Rep. 2021 Jun;18(3):229-236. doi: 10.1007/s11904-021-00552-3. Epub 2021 Mar 4. Curr HIV/AIDS Rep. 2021. PMID: 33661445 Free PMC article. Review.
Cited by
-
Emergence and evolution of big data science in HIV research: Bibliometric analysis of federally sponsored studies 2000-2019.Int J Med Inform. 2021 Oct;154:104558. doi: 10.1016/j.ijmedinf.2021.104558. Epub 2021 Aug 18. Int J Med Inform. 2021. PMID: 34481301 Free PMC article. Review.
-
Use of machine learning approaches to predict transition of retention in care among people living with HIV in South Carolina: a real-world data study.AIDS Care. 2024 Dec;36(12):1745-1753. doi: 10.1080/09540121.2024.2361245. Epub 2024 Jun 4. AIDS Care. 2024. PMID: 38833544
-
Predictive models to identify individuals with HIV at risk of unsuppressed viral load using routine public health data.J Acquir Immune Defic Syndr. 2025 Apr 3;99(4):325-33. doi: 10.1097/QAI.0000000000003670. Online ahead of print. J Acquir Immune Defic Syndr. 2025. PMID: 40179120 Free PMC article.
-
UbiComb: A Hybrid Deep Learning Model for Predicting Plant-Specific Protein Ubiquitylation Sites.Genes (Basel). 2021 May 11;12(5):717. doi: 10.3390/genes12050717. Genes (Basel). 2021. PMID: 34064731 Free PMC article.
-
Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study.JMIR Med Inform. 2021 Mar 10;9(3):e23456. doi: 10.2196/23456. JMIR Med Inform. 2021. PMID: 33688848 Free PMC article.
References
-
- Skarbinski J, Rosenberg E, Paz-Bailey G, et al.Human immunodeficiency virus transmission at each step of the care continuum in the United States. JAMA Intern Med 2015; 175 (4): 588–96. - PubMed
-
- The Lancet HIV. U=U taking off in 2017. Lancet HIV 2017; 4 (11): e475. - PubMed
-
- Understanding the HIV Care Continuum. 2019. https://www.cdc.gov/hiv/pdf/library/factsheets/cdc-hiv-care-continuum.pdf Accessed September 18, 2020.
-
- Status of HIV in America. 2019. https://www.hiv.gov/federal-response/ending-the-hiv-epidemic/key-strategies Accessed September 18, 2020.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical