Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct;11(10):e10431.
doi: 10.15252/emmm.201910431. Epub 2019 Aug 30.

Diagnostic host gene signature for distinguishing enteric fever from other febrile diseases

Affiliations

Diagnostic host gene signature for distinguishing enteric fever from other febrile diseases

Christoph J Blohmke et al. EMBO Mol Med. 2019 Oct.

Abstract

Misdiagnosis of enteric fever is a major global health problem, resulting in patient mismanagement, antimicrobial misuse and inaccurate disease burden estimates. Applying a machine learning algorithm to host gene expression profiles, we identified a diagnostic signature, which could distinguish culture-confirmed enteric fever cases from other febrile illnesses (area under receiver operating characteristic curve > 95%). Applying this signature to a culture-negative suspected enteric fever cohort in Nepal identified a further 12.6% as likely true cases. Our analysis highlights the power of data-driven approaches to identify host response patterns for the diagnosis of febrile illnesses. Expression signatures were validated using qPCR, highlighting their utility as PCR-based diagnostics for use in endemic settings.

Keywords: biomarker; enteric fever; machine learning; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

AJP chairs the UK Department of Health and Social Care's (DHCSC) Joint Committee on Vaccination and Immunisation and the EMA Scientific Advisory Group on vaccines, and he is a member of WHO's Strategic Advisory Group of Experts. The views expressed in the publication are those of the author(s) and not necessarily those of the DHSC, NIHR or WHO.

Figures

Figure 1
Figure 1. Overview of Oxford and Nepal comparison
  1. A

    Overview of enteric fever cohorts used in this study (T1: typhoid CHIM study 1; T2: typhoid CHIM study 2; P1: paratyphoid CHIM; 03NP: Nepali cohort. ST: S. Typhi; SPT: S. Paratyphi A; sEF: suspected enteric fever; D0: day of challenge, which represents the control samples in the Oxford CHIM; CTRL, endemic community controls; nD7, day 7 after challenge in participants who stayed well in the CHIM; BC+, blood culture positive; BC−, blood culture negative; Dx, diagnosis).

  2. B

    Volcano plots of up‐regulated (red) and down‐regulated (blue) genes (compared to healthy control samples) in S. Typhi‐ and S. Paratyphi A‐positive individuals (Nepal and Oxford). Black numbers indicate the up‐ and down‐regulated genes compared to healthy controls.

  3. C

    Circular plot depicting the overlap of BTMs between enteric fever and nD7 samples from Oxford and Nepal. Tracks (from outer to inner): cohort and samples; BTM labels; direction of enrichment (blue: down; red: up; compared to healthy controls). Cords represent overlap of enrichment between given cohorts (red: overlap between P1‐SPT and T1‐ST; green: overlap between T1‐nD7 and P1‐nD7; blue: overlap of 03NP‐ST with P1‐SPT and T1‐ST; purple: overlap of 03NP‐SPT with P1‐SPT and T1‐ST; yellow: overlap between 03NP‐SPT and 03NP‐ST).

  4. D, E

    Scatter plots of BTMs enriched (P > 0.05) in blood culture‐positive samples in Nepal (y‐axis) versus Oxford (x‐axis) for typhoid fever (D) and paratyphoid fever (E). For further details on BTMs, refer to reference (Chaussabel & Baldwin, 2014).

Figure 2
Figure 2. BTM responses between EF cohorts and heatmap view of samples collected in Nepal
  1. A

    Single‐sample GSEA normalised enrichment scores (NES) of IFN and DC BTMs of individuals with blood culture‐confirmed enteric fever in Nepal and Oxford. Data are median and 25th/75th percentile. For numbers in each group please refer to Figure 1A.

  2. B

    Heatmap of the 500 most variably expressed genes in samples of the Nepali cohort. Bar graph on top of the heatmap shows temperature of each individual at the time of sampling. Three samples labelled as “03NP‐CONT” are samples that grew bacterial contaminants and were thus excluded from the entire analysis.

Figure 3
Figure 3. Flow diagram of machine learning analysis
The discover cohort consisted of only Illumina datasets and was used for feature selection using the GRRF algorithm. For the validation cohort, Affymetrix datasets were also included. A cohort of unknown samples consisted of pre‐challenge baseline samples of participants who stayed well following challenge, their respective nD7 samples (7 days after challenge) and febrile, culture‐negative suspected enteric fever (sEF) cases from Nepal. Refer to Appendix Table S2 for study identifiers. 03NP: Nepali cohort. T1: Oxford typhoid CHIM study 1. T2: Oxford typhoid CHIM study 2; P1: Oxford paratyphoid CHIM.
Figure 4
Figure 4. Identification of diagnostic signatures
  1. A

    Ranking of genes by their selection frequency into the diagnostic signature out of 100 iterations (orange dot) during the 2‐class classification. Y‐axis = genes ranked by selection frequency. X‐axis = importance measure of each gene across all 100 iterations. Green dots: importance measure for each gene per iteration. A cut‐off of 25% was selected to detect a 5‐gene putative diagnostic signature (orange bar).

  2. B

    Performance of the 5‐gene classifier when predicting the class membership of the validation cohort.

  3. C

    Top: probability of an EF sample to be classified as non‐EF (> 0.5). Bottom: probability of sample belonging to “Rest” to be classified as EF (> 0.5). Red dotted line signifies the 0.5 prediction probability. Y‐axis: prediction probability ranging from 0 to 1.

  4. D

    Combined expression score for samples based on the 5‐gene signature for samples in the discovery cohort (top) and validation cohort (bottom). Ox.CTRL, Oxford controls (D0); CTRL, Nepali control samples; PTB, pulmonary TB; DENV, dengue samples; bsPf, blood‐stage P. falciparum; SPT, S. Paratyphi A; ST, S. Typhi. ST and SPT samples are derived from the challenge models as well as from Nepal. Significance levels were determined using Student's t‐test (two‐sided): *P < 0.05; **P < 0.01; ****P < 0.0001. Number of samples per group: Discovery: Ox.CTRL  =  45; CTRL = 175; PTB  = 54; DENV = 67; bsPf  =  94; ST = 44; SPT = 30. Validation: Ox.CTRL = 50; CTRL = 59; PTB = 97; DENV = 49; bsPf = 19; ST = 50.

  5. E

    Ranking of genes by their selection frequency into the diagnostic signature out of 100 iterations during the multiclass classification. A cut‐off of 25% was selected to detect a 7‐gene putative diagnostic signature (orange bar).

  6. F

    Classification probabilities for each sample of the validation cohort based on the 7‐gene signature.

Figure 5
Figure 5. Prediction of Nepali unknown samples using the 2‐class and qPCR validation
  1. A

    PCA of sEF samples based on the 5‐gene signature (based on gene array data) coloured by predicted class membership (EF: purple; green: rest).

  2. B

    Dot plot of prediction probability of being class EF versus the expression score calculated on the bases of the 5‐gene signature (based on gene array data).

  3. C

    qPCR gene expression scores of the 5‐gene signature (ΔΔC T over PPIA) for CTRLs, 03NP‐sEF, 03NP‐SPT and 03NP‐ST samples from Nepal. Yellow diamonds in the 03NP‐sEF category represent the nine patients classified as EF based on the Random Forest algorithm.

  4. D

    qPCR expression values (ΔΔC t over PPIA) of the 5‐gene signature in control samples (Oxford and Nepal), S. Paratyphi A (03NP‐SPT) or S. Typhi (03NP‐ST) in Nepal, samples at day 7 after challenge of participants who stayed well following challenge with S. Typhi (nD7), or typhoid diagnosis after challenge (TD) in the Vi‐TCV study (Appendix Table S2). Colour legend in panel (E). Data are median with the 25th/75th percentile. N per group: CTRL = 64; nD7 = 5; 03NP‐SPT = 9; 03NP‐ST = 13; TD = 12.

  5. E

    Combined qPCR expression score of the 5‐gene signature. Black arrows indicate outlier samples. Data are median with the 25th/75th percentile. N per group: CTRL = 64; nD7 = 5; 03NP‐SPT = 9; 03NP‐ST = 13; TD = 12.

  6. F

    Temperature and CRP for samples of which data were available (CRP was only measured in the Oxford CHIM). D0, pre‐challenge baseline Vi‐TCV study; nD7, day‐7 samples of participants who stayed well following challenge (Vi‐TCV study); SPT, S. Paratyphi A (03NP); ST, S. Typhi (03NP); TD, typhoid diagnosis (Vi‐TCV study).

  7. G

    Spearman's rank correlation of the 5‐gene combined expression score and (left) temperature (only nD7 and TD samples from the Oxford CHIM—Vi‐TCV and SPT and ST cases from Nepal at presentation to hospital were included) and (right) CRP (CRP was only available for Oxford CHIM—Vi‐TCV samples, and we excluded D0 baseline measures).

References

    1. Anderson ST, Kaforou M, Brent AJ, Wright VJ, Banwell CM, Chagaluka G, Crampin AC, Dockrell HM, French N, Hamilton MS et al (2014) Diagnosis of childhood tuberculosis and host RNA expression in Africa. N Engl J Med 370: 1712–1723 - PMC - PubMed
    1. Andres‐Terre M, McGuire HM, Pouliot Y, Bongen E, Sweeney TE, Tato CM, Khatri P (2015) Integrated, multi‐cohort analysis identifies conserved transcriptional signatures across multiple respiratory viruses. Immunity 43: 1199–1211 - PMC - PubMed
    1. Andrews JR, Baker S, Marks F, Alsan M, Garrett D, Gellin BG, Saha SK, Qamar FN, Yousafzai MT, Bogoch II et al (2018) Typhoid conjugate vaccines: a new tool in the fight against antimicrobial resistance. Lancet Infect Dis 19: e26–e30 - PubMed
    1. Arjyal A, Basnyat B, Nhan HT, Koirala S, Giri A, Joshi N, Shakya M, Pathak KR, Mahat SP, Prajapati SP et al (2016) Gatifloxacin versus ceftriaxone for uncomplicated enteric fever in Nepal: an open‐label, two‐centre, randomised controlled trial. Lancet Infect Dis 16: 535–545 - PMC - PubMed
    1. Baker S (2011) Genomic medicine has failed the poor. Nature 478: 287 - PubMed

Publication types