An 8-gene machine learning model improves clinical prediction of severe dengue progression
- PMID: 35346346
- PMCID: PMC8959795
- DOI: 10.1186/s13073-022-01034-w
An 8-gene machine learning model improves clinical prediction of severe dengue progression
Abstract
Background: Each year 3-6 million people develop life-threatening severe dengue (SD). Clinical warning signs for SD manifest late in the disease course and are nonspecific, leading to missed cases and excess hospital burden. Better SD prognostics are urgently needed.
Methods: We integrated 11 public datasets profiling the blood transcriptome of 365 dengue patients of all ages and from seven countries, encompassing biological, clinical, and technical heterogeneity. We performed an iterative multi-cohort analysis to identify differentially expressed genes (DEGs) between non-severe patients and SD progressors. Using only these DEGs, we trained an XGBoost machine learning model on public data to predict progression to SD. All model parameters were "locked" prior to validation in an independent, prospectively enrolled cohort of 377 dengue patients in Colombia. We measured expression of the DEGs in whole blood samples collected upon presentation, prior to SD progression. We then compared the accuracy of the locked XGBoost model and clinical warning signs in predicting SD.
Results: We identified eight SD-associated DEGs in the public datasets and built an 8-gene XGBoost model that accurately predicted SD progression in the independent validation cohort with 86.4% (95% CI 68.2-100) sensitivity and 79.7% (95% CI 75.5-83.9) specificity. Given the 5.8% proportion of SD cases in this cohort, the 8-gene model had a positive and negative predictive value (PPV and NPV) of 20.9% (95% CI 16.7-25.6) and 99.0% (95% CI 97.7-100.0), respectively. Compared to clinical warning signs at presentation, which had 77.3% (95% CI 58.3-94.1) sensitivity and 39.7% (95% CI 34.7-44.9) specificity, the 8-gene model led to an 80% reduction in the number needed to predict (NNP) from 25.4 to 5.0. Importantly, the 8-gene model accurately predicted subsequent SD in the first three days post-fever onset and up to three days prior to SD progression.
Conclusions: The 8-gene XGBoost model, trained on heterogeneous public datasets, accurately predicted progression to SD in a large, independent, prospective cohort, including during the early febrile stage when SD prediction remains clinically difficult. The model has potential to be translated to a point-of-care prognostic assay to reduce dengue morbidity and mortality without overwhelming limited healthcare resources.
Keywords: Biomarkers; Dengue; Gene signature; Host response; Machine learning; Prognostic; Severe dengue.
© 2022. The Author(s).
Conflict of interest statement
BAP reports Scientific Advisory Board membership for Globavir, outside the submitted work; in addition, BAP has a patent US 9725774 B2 licensed to Globavir. PK reports personal fees from Inflammatix, Inc., Cepheid, Inc., Vir Biotechnology, and Genentech, outside the submitted work. The 8-gene set has been disclosed for possible patent protection to the Stanford Office of Technology and Licensing by YEL, SE, and PK. The remaining authors declare that they have no competing interests.
Figures



References
-
- WHO. Dengue and severe dengue. Geneva: World Health Organization; 2020. Available from: https://www.who.int/news-room/fact-sheets/detail/dengue-and-severe-dengue. Accessed 10 Dec 2020.
-
- Xin Tian C, Baharuddin KA, Shaik Farid AW, Andey R, Ridzuan MI, Siti-Azrin AH. Ultrasound findings of plasma leakage as imaging adjunct in clinical management of dengue fever without warning signs. Med J Malaysia. 2020;75(6):635–641. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources