Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 6;81(3):521-530.
doi: 10.1093/cid/ciaf149.

Machine Learning-based Prediction of Active Tuberculosis in People With HIV Using Clinical Data

Affiliations

Machine Learning-based Prediction of Active Tuberculosis in People With HIV Using Clinical Data

Lena Bartl et al. Clin Infect Dis. .

Abstract

Background: Coinfections of Mycobacterium tuberculosis (MTB) and human immunodeficiency virus (HIV) impose a substantial global health burden. Patients with MTB infection face a heightened risk of progression to incident active TB, which preventive therapy can mitigate. Current testing methods often fail to identify individuals who subsequently develop incident active TB.

Methods: We developed random forest models to predict incident active TB using patients' medical data at HIV-1 diagnosis. Training our model involved using clinical data routinely collected at enrollment from the Swiss HIV Cohort Study (SHCS). This dataset encompassed 55 people with HIV (PWH) who developed incident active TB 6 months after enrollment and 1432 matched PWH without TB enrolled between 2000 and 2023. External validation used data from the Austrian HIV Cohort Study, comprising 43 people with incident active TB and 1005 people without TB.

Results: We predicted incident active TB with an area under the receiver operating characteristic curve of 0.83 (95% CI: .8-.86) in the SHCS. After adjusting for ethnicity and the region of origin and refitting the model with fewer parameters, we obtained comparable receiver operating characteristic curve values of 0.72 (SHCS) and 0.67 (Austrian HIV Cohort Study). Our model outperformed the standard of care (tuberculin skin test and interferon-gamma release assay) in identifying high-risk patients, demonstrated by a lower number needed to diagnose (1.96 vs 4).

Conclusions: Models based on machine learning offer considerable promise for improving care for PWH, requiring no additional data collection and incurring minimal additional costs while enhancing the identification of PWH that could benefit from preventive TB treatment.

Keywords: HIV; clinical risk score; machine learning; prediction; tuberculosis.

PubMed Disclaimer

Conflict of interest statement

Potential conflicts of interest. A. C. received grants from Merck Sharp & Dohme (MSD), ViiV Healthcare, and Gilead Sciences for unrelated research. R. D. K. received grants from Gilead Sciences and the National Institutes of Health (NIH) for unrelated research. L. B. received honoraria for working on the advisory board of Gilead Sciences, Merck, ViiV, Pfizer, and AstraZeneca. L. B. received honoraria for presentations from Gilead Sciences and Merck. E. B. received grants from MSD for unrelated research. E. B. received payments for travel reimbursement from ViiV, MSD, Gilead Sciences, Pfizer, and Abbvie. E. B. received honoraria for working on the advisory board of ViiV, MSD, Pfizer, Gilead Sciences, AstraZeneca, and Ely Lilly. H. H. H. received honoraria for working on the advisory board of AiCuris, Merck, Vera Dx, and Molecular Partners. H. H. H. received honoraria for presentations from Merck, Gilead Sciences, Biotest, and Vera Dx. J. N. received honoraria for presentations from Oxford Immunotec and ViiV. H. F. G. received honoraria for working on the advisory board of Gilead Sciences, Merck, ViiV, Janssen, Johnson and Johnson, Novartis, and GlaxoSmithKline (GSK). H. F. G. received payments for travel reimbursements from Gilead Sciences. H. F. G. received grants from NIH, Yvonne Jacob Foundation, the Bill and Melinda Gates foundation, Gilead Sciences, and ViiV healthcare. All other authors report no potential conflicts. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.

Figures

Figure 1.
Figure 1.
Selection of the population for the SHCS and the AHIVCOS. Flowchart of the study populations: A, SHCS and B, AHIVCOS. Incident active TB, TB outbreak ≥6 m after registration. Abbreviations: AHIVCOS, Austrian HIV Cohort Study; HIV, human immunodeficiency virus; SHCS, Swiss HIV Cohort Study; TB, tuberculosis.
Figure 2.
Figure 2.
A, Receiver operating characteristic (ROC) curve for incident active TB outbreak derived from the random forest model prediction and true TB status, built on all parameters of the SHCS data. The Youden Index is calculated by (sensitivity + specificity − 1) and was used to determine the optimum point of the ROC curve. B, Variable importance sorted by mean decrease accuracy, measured by removing the association between a predictor variable and the outcome variable and determining the resulting increase in error. C, ROC curve for the incident active TB outbreak of the random forest model. The model was built using all parameters of the SHCS data and validated with SHCS data from patients who had their incident active TB outbreak less than 4 y after registration. Abbreviations: BMI, body mass index; CD, cluster of differentiation; HDL, high-density lipoproteins; SHCS, Swiss HIV Cohort Study; TB, tuberculosis.
Figure 3.
Figure 3.
A, Receiver operating characteristic curve for incident active TB outbreak of the random forest model, built on the top 20 parameters excluding ethnicity and region of origin and recoding them into high- and low-incidence TB countries of the SHCS data, validated on SHCS data. The Youden Index was used to determine the optimum point of the ROC curve. B, Smoothed ROC curve for the model validated on AHIVCOS data. C, Variable distribution in the SHCS and AHIVCOS data, stratified by people with incident active TB and people without TB. For each group (SHCS people with v TB/people without TB, AHIVCOS people with incident active TB/people without TB), we examined the distribution of leukocytes, CD4 cell count, cholesterol, BMI, HIV RNA, and creatinine from left to right and top to bottom. The lines represent the mean of each group for each laboratory value. Abbreviations: AHIVCOS, Austrian HIV Cohort Study; AUC, area under the curve; BMI, body mass index; SHCS, Swiss HIV Cohort Study.

References

    1. Goletti D, Weissman D, Jackson RW, et al. Effect of Mycobacterium tuberculosis on HIV replication. Role of immune activation. J Immunol 1996; 157:1271–8. - PubMed
    1. Jiamsakul A, Lee MP, Van Nguyen K, et al. Socio-economic statuses and risk of tuberculosis—a case-control study of HIV-infected patients in Asia. Int J Tuberc Lung Dis 2018; 22:179–86. - PMC - PubMed
    1. Wood R, Maartens G, Lombard CJ. Risk factors for developing tuberculosis in HIV-1-infected adults from communities with a low or very high incidence of tuberculosis. J Acquir Immune Defic Syndr 2000; 23:75–80. - PubMed
    1. Bruchfeld J, Correia-Neves M, Källenius G. Tuberculosis and HIV coinfection. Cold Spring Harb Perspect Med 2015; 5:a017871. - PMC - PubMed
    1. Zeeb M, Tepekule B, Kusejko K, et al. Understanding the decline of incident, active tuberculosis in people with HIV in Switzerland. Clin Infect Dis 2023; 77:1303–11. - PMC - PubMed