Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 29;22(1):965.
doi: 10.1186/s12879-022-07954-7.

Development of diagnostic algorithm using machine learning for distinguishing between active tuberculosis and latent tuberculosis infection

Affiliations

Development of diagnostic algorithm using machine learning for distinguishing between active tuberculosis and latent tuberculosis infection

Ying Luo et al. BMC Infect Dis. .

Abstract

Background: The discrimination between active tuberculosis (ATB) and latent tuberculosis infection (LTBI) remains challenging. The present study aims to investigate the value of diagnostic models established by machine learning based on multiple laboratory data for distinguishing Mycobacterium tuberculosis (Mtb) infection status.

Methods: T-SPOT, lymphocyte characteristic detection, and routine laboratory tests were performed on participants. Diagnostic models were built according to various algorithms.

Results: A total of 892 participants (468 ATB and 424 LTBI) and another 263 participants (125 ATB and 138 LTBI), were respectively enrolled at Tongji Hospital (discovery cohort) and Sino-French New City Hospital (validation cohort). Receiver operating characteristic (ROC) curve analysis showed that the value of individual indicator for differentiating ATB from LTBI was limited (area under the ROC curve (AUC) < 0.8). A total of 28 models were successfully established using machine learning. Among them, the AUCs of 25 models were more than 0.9 in test set. It was found that conditional random forests (cforest) model, based on the implementation of the random forest and bagging ensemble algorithms utilizing conditional inference trees as base learners, presented best discriminative power in segregating ATB from LTBI. Specially, cforest model presented an AUC of 0.978, with the sensitivity of 93.39% and the specificity of 91.18%. Mtb-specific response represented by early secreted antigenic target 6 (ESAT-6) and culture filtrate protein 10 (CFP-10) spot-forming cell (SFC) in T-SPOT assay, as well as global adaptive immunity assessed by CD4 cell IFN-γ secretion, CD8 cell IFN-γ secretion, and CD4 cell number, were found to contribute greatly to the cforest model. Superior performance obtained in the discovery cohort was further confirmed in the validation cohort. The sensitivity and specificity of cforest model in validation set were 92.80% and 89.86%, respectively.

Conclusions: Cforest model developed upon machine learning could serve as a valuable and prospective tool for identifying Mtb infection status. The present study provided a novel and viable idea for realizing the clinical diagnostic application of the combination of machine learning and laboratory findings.

Keywords: Active tuberculosis; Diagnostic algorithm; Discrimination; Latent tuberculosis infection; Machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The performance of various indicators in distinguishing between ATB patients and LTBI individuals. A Pyramid delineating the comparison of various indicators between ATB patients and LTBI individuals. The values represented the median after normalization to range between 0 and 1. B ROC curves showing the performance of individual indicators in segregating ATB patients from LTBI individuals. C Cleveland dot plot showing the AUCs of various indicators in discriminating ATB patients from LTBI individuals. ATB: active tuberculosis; LTBI: latent tuberculosis infection; ROC: receiver operator characteristics; AUC: area under the ROC curve
Fig. 2
Fig. 2
Clustering and dimension reduction analysis based on laboratory data of ATB patients and LTBI individuals. A Tree and leaf plots showing the clustering on the basis of laboratory data. B The plot showing PCA dimension reduction based on laboratory data. The size of the circle represents the cos2. C The plot showing UMAP dimension reduction based on laboratory data. D The plot showing tSNE dimension reduction based on laboratory data. ATB: active tuberculosis; LTBI: latent tuberculosis infection; PCA: principal components analysis; tSNE: t-distributed stochastic neighbor embedding; UMAP: uniform manifold approximation and projection
Fig. 3
Fig. 3
The performance of different diagnostic models established by machine learning for discriminating ATB patients from LTBI individuals in discovery cohort. Scatter plots showing predictive values of diagnostic models (A cforest; B bart; C gamboost; D gbm; E glmnet; F lda; G log_reg; H svm) in ATB patients and LTBI individuals. Horizontal lines indicate the median. ***P < 0.001 (Mann–Whitney U test). Blue dotted lines indicate the cutoff value (0.5) in segregating these two groups. ROC curves showing the performance of diagnostic models (A cforest; B bart; C gamboost; D gbm; E glmnet; F lda; G log_reg; H svm) in segregating ATB patients from LTBI individuals. Tree and leaf plots showing predictive value of each participant when displaying as cluster distribution. The size of circle represents the predictive value. Cleveland dot plot showing the importance of various indicators in contributing to the model. ATB: active tuberculosis; LTBI: latent tuberculosis infection; ROC: receiver operator characteristics; AUC: area under the ROC curve
Fig. 4
Fig. 4
The validation of diagnostic models established for discriminating ATB patients from LTBI individuals. Scatter plots showing predictive values of diagnostic models (A cforest; B bart; C gamboost; D gbm; E glmnet; F lda; G log_reg; H svm) in ATB patients and LTBI individuals. Horizontal lines indicate the median. ***P < 0.001 (Mann–Whitney U test). Blue dotted lines indicate the cutoff value (0.5) in segregating these two groups. ROC curves showing the performance of diagnostic models (A cforest; B bart; C gamboost; D gbm; E glmnet; F lda; G log_reg; H svm) in segregating ATB patients from LTBI individuals. Tree and leaf plots showing predictive value of each participant when displaying as cluster distribution. The size of circle represents the predictive value. ATB: active tuberculosis; LTBI: latent tuberculosis infection; ROC: receiver operator characteristics; AUC: area under the ROC curve
Fig. 5
Fig. 5
The diagnostic performance of the established 28 models for differentiating ATB patients from LTBI individuals in A training set, B test set, and C validation set. The height and color of the column represented the value of performance parameters after normalization to range between 0 and 1. acc: accuracy; auc: area under the ROC curve; bacc: balanced accuracy; bbrier: binary brier score; ce: classification error; dor: diagnostic odds ratio; fbeta: F-beta score; fdr: false discovery rate; fn: false negatives; fnr: false negative rate; fomr: false omission rate; fp: false positives; fpr: false positive rate, mbrier: multiclass brier score; mcc: matthews correlation coefficient; npv: negative predictive value; ppv: positive predictive value; prauc: area under the precision-recall curve; tn: true negatives; tnr: true negative rate; tp: true positives; tpr: true positive rate

Similar articles

Cited by

References

    1. World Health Organization: Global tuberculosis report 2022. https://www.appswhoint/iris/rest/bitstreams/1474924/retrieve 2022, 27 Oct 2022. Geneva, Switzerland.
    1. Luo Y, Suliman S, Asgari S, Amariuta T, Baglaenko Y, Martinez-Bonet M, Ishigaki K, Gutierrez-Arcelus M, Calderon R, Lecca L, et al. Early progression to active tuberculosis is a highly heritable trait driven by 3q23 in Peruvians. Nat Commun. 2019;10(1):3765. doi: 10.1038/s41467-019-11664-1. - DOI - PMC - PubMed
    1. World Health Organization: The end TB strategy. https://www.appswhoint/iris/rest/bitstreams/1271371/retrieve 2015, 16 Aug 2015. Geneva, Switzerland.
    1. Gong W, Wu X. Differential diagnosis of latent tuberculosis infection and active tuberculosis: a key to a successful tuberculosis control strategy. Front Microbiol. 2021;12:745592. doi: 10.3389/fmicb.2021.745592. - DOI - PMC - PubMed
    1. World Health Organization: A Global Strategy for tuberculosis research and innovation. https://www.appswhoint/iris/rest/bitstreams/1312195/retrieve 2020, 19 Oct 2020. Geneva, Switzerland.

Substances