Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 20:2024.12.05.24318340.
doi: 10.1101/2024.12.05.24318340.

Plasma proteomics for novel biomarker discovery in childhood tuberculosis

Affiliations

Plasma proteomics for novel biomarker discovery in childhood tuberculosis

Andrea Fossati et al. medRxiv. .

Update in

  • Plasma proteomics for biomarker discovery in childhood tuberculosis.
    Fossati A, Wambi P, Jaganath D, Calderon R, Castro R, Mohapatra A, McKetney J, Luiz J, Nerurkar R, Nkereuwem E, Franke MF, Mousavian Z, Collins JM, Sigal GB, Segal MR, Kampman B, Wobudeya E, Cattamanchi A, Ernst JD, Zar HJ, Swaney DL; COMBO Study Consortium. Fossati A, et al. Nat Commun. 2025 Jul 19;16(1):6657. doi: 10.1038/s41467-025-61515-5. Nat Commun. 2025. PMID: 40683862 Free PMC article.

Abstract

Failure to rapidly diagnose tuberculosis disease (TB) and initiate treatment is a driving factor of TB as a leading cause of death in children. Current TB diagnostic assays have poor performance in children, and identifying novel non-sputum-based TB biomarkers to improve pediatric TB diagnosis is a global priority. We sought to develop a plasma biosignature for TB by probing the plasma proteome of 511 children stratified by TB diagnostic classification and HIV status from sites in four low- and middle-income countries, using high-throughput data-independent acquisition mass-spectrometry (DIA-PASEF-MS). We identified 47 proteins differentially regulated (BH adjusted p-values < 1%) between children with microbiologically confirmed TB and children with non-TB respiratory diseases (Unlikely TB). We further employed machine learning to derive three parsimonious biosignatures encompassing 4, 5, or 6 proteins that achieved AUCs of 0.86-0.88 all of which exceeded the minimum WHO target product profile accuracy thresholds for a TB screening test (70% specificity at 90% sensitivity, PPV 0.65-0.74, NPV 0.92-0.95). This work provides insights into the unique host response in pediatric TB disease, as well as a non-sputum biosignature that could reduce delays in TB diagnosis and improve detection and management of TB in children worldwide.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest The authors declare no conflicts of interest.

Figures

Figure 1.
Figure 1.. A high-throughput workflow for plasma proteomics.
A. Plasma proteomics workflow and experimental design. B. Barplot showing the total number of unique peptide sequences and protein groups identified. C-D. Number of peptides (C), and proteins (D) identified per MS injection. E. Percentage of identifications (Y axis) versus the number of identified proteins (X axis). F. Density for the concentration range covered. X-axis represents the logged ng/L concentration determined from HumanProteinAtlas, identified proteins are represented by the yellow density, while purple density represents remaining proteins.
Figure 2.
Figure 2.. Quality control and reproducibility of plasma proteomics across multiple clinical sites.
A. Pie chart illustrating the number of samples originating from each clinical site. B. Empirical cumulative distribution function plot for the raw MS intensity of the samples (X axis) from the various clinical sites. C. Upset plot showing the overlap in protein identifications between the different clinical sites. D. Principal Component Analysis (PCA) of the DIA-PASEF dataset following COMBAT batch correction. X axis shows the first component (10% variance) and Y axis the second component (6% variance). Each point represents a sample, while the color code indicates the clinical site. E. Protein level coefficient of variation within each clinical site and across all samples.
Figure 3.
Figure 3.. Abundance proteomics analysis of pediatric TB cohorts.
A. Benchmark of data between patients with respiratory burden and healthy controls (excluding Latent TB Infection). X axis shows the TB classification status while Y axis represents the protein-level intensity. Box shows the IQR and its Kruskal-Wallis test is represented as *for p < 0.01, *** for p < 0.0001, and **** for p < 0.000001. B. Gene set enrichment analysis for identification of dysregulated pathways between Confirmed TB and Unlikely TB. Dot size represents the Benjamini-Hochberg (BH) adjusted p from a mean difference (MD) test. Colors indicate the overlap between each signaling pathway and the protein dataset. C. Volcano plot between Confirmed and Unlikely TB. The X-axis shows the Log2 fold change at the protein level, while the Y-axis represents the significance as −log10 of the BH corrected p-values. Significant proteins (BH-adjusted p < 5%) are shown in red and blue. Barplot showing the number of differentially expressed proteins (DEPs) divided in upregulated (red) and downregulated (blue). D. Density plot showing the z-scored intensity for the most significantly regulated protein (IGHV3–30), divided by TB status in confirmed TB (red), unconfirmed TB (green) and unlikely TB (blue).
Figure 4.
Figure 4.. Machine learning to develop a parsimonious biosignature for pediatric TB disease.
A. Absolute feature importance from a LASSO model for the top10 most important features. B. ROC curves for best-scoring combination of features. Each curve represents the feature subset achieving the highest AUC derived from all combinations of 1 (n=67), 2 (n=2210), 3 (n=47904,) 4(n=766479), 5 (n= 9657647) and 6 (n=99795695) features. WHO TPP for a screening test (70% specificity and 90% sensitivity) is denoted by the black circle. C. Barplot for the sensitivity achieved at 70% specificity for all 6 models. Dotted red line represents 90% sensitivity. D. Venn diagram of the overlap in proteins from the 4-, 5-, or 6-protein model. E. Dotplot representing the mean (dot) and the standard deviation (line) for the proposed biosignature proteins across the three models achieving the WHO TPP. Different colors highlight the different TB classes according to NIH consensus definition. Each protein is normalized to the Unlikely TB protein abundance for that respective protein.
Figure 5.
Figure 5.. Detection of Unconfirmed TB.
A. Barplot showing the number of positive predicted (yellow) and negative predicted (purple) in the proposed linear models using 4, 5, or 6 proteins. Values in the barplot indicate the number of positive predicted cases. B. Upset plot displaying the overlap between all predictions. C. Principal component analysis of Confirmed and Unconfirmed TB. X-axis shows the first component and Y-axis shows the second component. Each dot represents a sample. Samples are color coded based on either TB status (Confirmed TB, black) and further for the Unconfirmed TB based on the prediction of the various models (⅓ models cyan, ⅔ models green, or all models yellow). Samples negatively predicted by all models (Negative Unconfirmed) are shown in orange.

References

    1. Global Tuberculosis Report 2024. https://www.who.int/teams/global-tuberculosis-programme/tb-reports/globa....
    1. Jaganath D., Beaudry J. & Salazar-Austin N. Tuberculosis in Children. Infect. Dis. Clin. North Am. 36, 49–71 (2022). - PMC - PubMed
    1. Yao F. et al. Plasma immune profiling combined with machine learning contributes to diagnosis and prognosis of active pulmonary tuberculosis. Emerg. Microbes Infect. 13, 2370399 (2024). - PMC - PubMed
    1. Koeppel L. et al. Diagnostic performance of host protein signatures as a triage test for active pulmonary TB. J. Clin. Microbiol. 61, e0026423 (2023). - PMC - PubMed
    1. Garay-Baquero D. J. et al. Comprehensive plasma proteomic profiling reveals biomarkers for active tuberculosis. JCI Insight 5, e137427, 137427 (2020). - PMC - PubMed

Publication types

LinkOut - more resources