Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 2:3:756405.
doi: 10.3389/frph.2021.756405. eCollection 2021.

Machine Learning-Based HIV Risk Estimation Using Incidence Rate Ratios

Affiliations

Machine Learning-Based HIV Risk Estimation Using Incidence Rate Ratios

Oliver Haas et al. Front Reprod Health. .

Abstract

HIV/AIDS is an ongoing global pandemic, with an estimated 39 million infected worldwide. Early detection is anticipated to help improve outcomes and prevent further infections. Point-of-care diagnostics make HIV/AIDS diagnoses available both earlier and to a broader population. Wide-spread and automated HIV risk estimation can offer objective guidance. This supports providers in making an informed decision when considering patients with high HIV risk for HIV testing or pre-exposure prophylaxis (PrEP). We propose a novel machine learning method that allows providers to use the data from a patient's previous stays at the clinic to estimate their HIV risk. All features available in the clinical data are considered, making the set of features objective and independent of expert opinions. The proposed method builds on association rules that are derived from the data. The incidence rate ratio (IRR) is determined for each rule. Given a new patient, the mean IRR of all applicable rules is used to estimate their HIV risk. The method was tested and validated on the publicly available clinical database MIMIC-IV, which consists of around 525,000 hospital stays that included a stay at the intensive care unit or emergency department. We evaluated the method using the area under the receiver operating characteristic curve (AUC). The best performance with an AUC of 0.88 was achieved with a model consisting of 53 rules. A threshold value of 0.66 leads to a sensitivity of 98% and a specificity of 53%. The rules were grouped into drug abuse, psychological illnesses (e.g., PTSD), previously known associations (e.g., pulmonary diseases), and new associations (e.g., certain diagnostic procedures). In conclusion, we propose a novel HIV risk estimation method that builds on existing clinical data. It incorporates a wide range of features, leading to a model that is independent of expert opinions. It supports providers in making informed decisions in the point-of-care diagnostics process by estimating a patient's HIV risk.

Keywords: HIV; artificial intelligence; association rules; bias; clinical data; incidence rate ratio; machine learning; risk estimation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Examples of all different possibilities of how one patient can be categorized into the groups x occurred, x did not occur, positive, and negative. The thick lines denote the time in which x occurred resp. did not occur, while the dashed lines denote the time that does not contribute to this patients time span.
Figure 2
Figure 2
The mean area under the receiver operating characteristic curve (AUC) for the different experiment configurations. The x-axis shows the minimum number of HIV-positive and HIV-negative patients, the facets show different p-value thresholds, the color indicates if Bonferroni correction was used, and the y-axis shows the mean AUC achieved by this configuration, with the error bars indicating plus/minus one standard deviation.
Figure 3
Figure 3
The mean number of rules in the model for the different experiment configurations. The x-axis shows the minimum number of HIV-positive and HIV-negative patients, the facets show different p-value thresholds, the color indicates if Bonferroni correction was used, and the y-axis shows the mean number of rules returned by this configuration, with the error bars indicating plus/minus one standard deviation.
Figure 4
Figure 4
The receiver operating characteristic curve for the reference model. The x-axis shows (one minus) the specificity, and the y-axis shows the sensitivity. Every dot is one possible threshold value. The red line denotes the curve of a random classifier (i.e., a coin toss).

Similar articles

Cited by

References

    1. UNAIDS. 2020 Global AIDS Update: Seizing the Moment — Tackling Entrenched Inequalities to End Epidemics. UNAIDS (2020). Available online at: https://www.unaids.org/en/resources/documents/2020/global-aids-report
    1. Lazarus JV, Hoekstra M, Raben D, Delpech V, Coenen T, Lundgren JD. The case for indicator condition-guided HIV screening. HIV Med. (2013) 14:445–8. 10.1111/hiv.12022 - DOI - PubMed
    1. Egger M, May M, Chêne G, Phillips AN, Ledergerber B, Dabis F, et al. . Prognosis of HIV-1-infected patients starting highly active antiretroviral therapy: a collaborative analysis of prospective studies. Lancet. (2002) 360:119–29. 10.1016/S0140-6736(02)09411-4 - DOI - PubMed
    1. Marks G, Crepaz N, Janssen RS. Estimating sexual transmission of HIV from persons aware and unaware that they are infected with the virus in the USA. AIDS. (2006) 20:1447–50. 10.1097/01.aids.0000233579.79714.8d - DOI - PubMed
    1. Fleishman JA, Yehia BR, Moore RD, Gebo KA. The economic burden of late entry into medical care for patients with HIV infection. Med Care. (2010) 48:1071–9. 10.1097/MLR.0b013e3181f81c4a - DOI - PMC - PubMed