Diagnostic host gene signature for distinguishing enteric fever from other febrile diseases
- PMID: 31468702
- PMCID: PMC6783646
- DOI: 10.15252/emmm.201910431
Diagnostic host gene signature for distinguishing enteric fever from other febrile diseases
Abstract
Misdiagnosis of enteric fever is a major global health problem, resulting in patient mismanagement, antimicrobial misuse and inaccurate disease burden estimates. Applying a machine learning algorithm to host gene expression profiles, we identified a diagnostic signature, which could distinguish culture-confirmed enteric fever cases from other febrile illnesses (area under receiver operating characteristic curve > 95%). Applying this signature to a culture-negative suspected enteric fever cohort in Nepal identified a further 12.6% as likely true cases. Our analysis highlights the power of data-driven approaches to identify host response patterns for the diagnosis of febrile illnesses. Expression signatures were validated using qPCR, highlighting their utility as PCR-based diagnostics for use in endemic settings.
Keywords: biomarker; enteric fever; machine learning; transcriptomics.
© 2019 The Authors. Published under the terms of the CC BY 4.0 license.
Conflict of interest statement
AJP chairs the UK Department of Health and Social Care's (DHCSC) Joint Committee on Vaccination and Immunisation and the EMA Scientific Advisory Group on vaccines, and he is a member of WHO's Strategic Advisory Group of Experts. The views expressed in the publication are those of the author(s) and not necessarily those of the DHSC, NIHR or WHO.
Figures
- A
Overview of enteric fever cohorts used in this study (T1: typhoid CHIM study 1; T2: typhoid CHIM study 2; P1: paratyphoid CHIM; 03NP: Nepali cohort. ST: S. Typhi; SPT: S. Paratyphi A; sEF: suspected enteric fever; D0: day of challenge, which represents the control samples in the Oxford CHIM; CTRL, endemic community controls; nD7, day 7 after challenge in participants who stayed well in the CHIM; BC+, blood culture positive; BC−, blood culture negative; Dx, diagnosis).
- B
Volcano plots of up‐regulated (red) and down‐regulated (blue) genes (compared to healthy control samples) in S. Typhi‐ and S. Paratyphi A‐positive individuals (Nepal and Oxford). Black numbers indicate the up‐ and down‐regulated genes compared to healthy controls.
- C
Circular plot depicting the overlap of BTMs between enteric fever and nD7 samples from Oxford and Nepal. Tracks (from outer to inner): cohort and samples; BTM labels; direction of enrichment (blue: down; red: up; compared to healthy controls). Cords represent overlap of enrichment between given cohorts (red: overlap between P1‐SPT and T1‐ST; green: overlap between T1‐nD7 and P1‐nD7; blue: overlap of 03NP‐ST with P1‐SPT and T1‐ST; purple: overlap of 03NP‐SPT with P1‐SPT and T1‐ST; yellow: overlap between 03NP‐SPT and 03NP‐ST).
- D, E
Scatter plots of BTMs enriched (P > 0.05) in blood culture‐positive samples in Nepal (y‐axis) versus Oxford (x‐axis) for typhoid fever (D) and paratyphoid fever (E). For further details on BTMs, refer to reference (Chaussabel & Baldwin, 2014).
- A
Single‐sample GSEA normalised enrichment scores (NES) of IFN and DC BTMs of individuals with blood culture‐confirmed enteric fever in Nepal and Oxford. Data are median and 25th/75th percentile. For numbers in each group please refer to Figure 1A.
- B
Heatmap of the 500 most variably expressed genes in samples of the Nepali cohort. Bar graph on top of the heatmap shows temperature of each individual at the time of sampling. Three samples labelled as “03NP‐CONT” are samples that grew bacterial contaminants and were thus excluded from the entire analysis.
- A
Ranking of genes by their selection frequency into the diagnostic signature out of 100 iterations (orange dot) during the 2‐class classification. Y‐axis = genes ranked by selection frequency. X‐axis = importance measure of each gene across all 100 iterations. Green dots: importance measure for each gene per iteration. A cut‐off of 25% was selected to detect a 5‐gene putative diagnostic signature (orange bar).
- B
Performance of the 5‐gene classifier when predicting the class membership of the validation cohort.
- C
Top: probability of an EF sample to be classified as non‐EF (> 0.5). Bottom: probability of sample belonging to “Rest” to be classified as EF (> 0.5). Red dotted line signifies the 0.5 prediction probability. Y‐axis: prediction probability ranging from 0 to 1.
- D
Combined expression score for samples based on the 5‐gene signature for samples in the discovery cohort (top) and validation cohort (bottom). Ox.CTRL, Oxford controls (D0); CTRL, Nepali control samples; PTB, pulmonary TB; DENV, dengue samples; bsPf, blood‐stage P. falciparum; SPT, S. Paratyphi A; ST, S. Typhi. ST and SPT samples are derived from the challenge models as well as from Nepal. Significance levels were determined using Student's t‐test (two‐sided): *P < 0.05; **P < 0.01; ****P < 0.0001. Number of samples per group: Discovery: Ox.CTRL = 45; CTRL = 175; PTB = 54; DENV = 67; bsPf = 94; ST = 44; SPT = 30. Validation: Ox.CTRL = 50; CTRL = 59; PTB = 97; DENV = 49; bsPf = 19; ST = 50.
- E
Ranking of genes by their selection frequency into the diagnostic signature out of 100 iterations during the multiclass classification. A cut‐off of 25% was selected to detect a 7‐gene putative diagnostic signature (orange bar).
- F
Classification probabilities for each sample of the validation cohort based on the 7‐gene signature.
- A
PCA of sEF samples based on the 5‐gene signature (based on gene array data) coloured by predicted class membership (EF: purple; green: rest).
- B
Dot plot of prediction probability of being class EF versus the expression score calculated on the bases of the 5‐gene signature (based on gene array data).
- C
qPCR gene expression scores of the 5‐gene signature (ΔΔC T over PPIA) for CTRLs, 03NP‐sEF, 03NP‐SPT and 03NP‐ST samples from Nepal. Yellow diamonds in the 03NP‐sEF category represent the nine patients classified as EF based on the Random Forest algorithm.
- D
qPCR expression values (ΔΔC t over PPIA) of the 5‐gene signature in control samples (Oxford and Nepal), S. Paratyphi A (03NP‐SPT) or S. Typhi (03NP‐ST) in Nepal, samples at day 7 after challenge of participants who stayed well following challenge with S. Typhi (nD7), or typhoid diagnosis after challenge (TD) in the Vi‐TCV study (Appendix Table S2). Colour legend in panel (E). Data are median with the 25th/75th percentile. N per group: CTRL = 64; nD7 = 5; 03NP‐SPT = 9; 03NP‐ST = 13; TD = 12.
- E
Combined qPCR expression score of the 5‐gene signature. Black arrows indicate outlier samples. Data are median with the 25th/75th percentile. N per group: CTRL = 64; nD7 = 5; 03NP‐SPT = 9; 03NP‐ST = 13; TD = 12.
- F
Temperature and CRP for samples of which data were available (CRP was only measured in the Oxford CHIM). D0, pre‐challenge baseline Vi‐TCV study; nD7, day‐7 samples of participants who stayed well following challenge (Vi‐TCV study); SPT, S. Paratyphi A (03NP); ST, S. Typhi (03NP); TD, typhoid diagnosis (Vi‐TCV study).
- G
Spearman's rank correlation of the 5‐gene combined expression score and (left) temperature (only nD7 and TD samples from the Oxford CHIM—Vi‐TCV and SPT and ST cases from Nepal at presentation to hospital were included) and (right) CRP (CRP was only available for Oxford CHIM—Vi‐TCV samples, and we excluded D0 baseline measures).
References
-
- Andrews JR, Baker S, Marks F, Alsan M, Garrett D, Gellin BG, Saha SK, Qamar FN, Yousafzai MT, Bogoch II et al (2018) Typhoid conjugate vaccines: a new tool in the fight against antimicrobial resistance. Lancet Infect Dis 19: e26–e30 - PubMed
-
- Baker S (2011) Genomic medicine has failed the poor. Nature 478: 287 - PubMed
Publication types
MeSH terms
Associated data
- Actions
- Actions
- Actions
- Actions
- Actions
- Actions
- Actions
- Actions
- Actions
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
